preface

MySQL > select * from B+tree; select * from B+tree; In this article, I talked about the data structure B+tree and the working principle of MySQL’s underlying index. How does MySQL store data on a disk?

The storage engine

Baidu Baike defines storage engine as follows: data in MySQL is stored in files (or memory) using various technologies. These different technologies and related functions are called storage engine in MySQL.

Simply put, different storage engines will store our data in different formats. Just like the image has different formats, such as.jpg,.png,.gif and so on…

Literacy: Storage engines work on tables.

There are two storage engines commonly used in MySQL: MyISAM and InnoDB.

Before MySQL 5.5, MyISAM was the default storage engine.

Starting with MySQL 5.5, InnoDB is the default storage engine.

The main difference

MyISAM InnoDB
The transaction Does not support ❌ support
Table/row locks Only the table lock Row locking was also introduced
A foreign key Does not support ❌ Support ✔
The full text indexing Support ✔ Support is available in version 5.6
Read/write speed faster More slowly

MyISAM does not support transactions, whereas InnoDB does. So now InnoDB has become the standard and most mainstream storage engine we use.

Relevant command

Query the storage engines supported by the current database

show engines;

Copy the code

Example Query the default storage engine

show variables like '%storage_engine%';

Copy the code

Query information about the table

show table status like 'the name of the table';

Copy the code

MyISAM

Each MyISAM table is stored on disk as three files. The names of these files begin with the table name and have an extension indicating the file type.

.frm file (frame) stores the table structure;

MYD files (MY Data) store table Data;

MYI files (MY Index) store table indexes.

By default, data in MySQL is stored in the data folder in the installation directory. You can also modify the data by yourself.


Below I create a table T_user_MyISam that uses MyISAM as the storage engine.


MYI files organize indexes in B+tree. The value of the leaf node stores the disk file address in the row where the index resides.


The underlying lookup process:

First, it will determine whether the field in the search condition where is an index field. If so, it will first take this field to the.MYI file through B+tree fast location, starting from the root node location search;

After that, we can locate the record on the index row by taking the disk file address of the index keyword (which is our condition) to the.myd file.

Row data that is logically adjacent to a table is not necessarily physically adjacent on disk.


InnoDB

At the bottom of an InnoDB table, two files are stored in folders.

.frm file (frame) stores the table structure;

Ibd file (InnoDB Data) stores table index + Data.

Next I created a table t_user_Innodb that uses InnoDB as the storage engine.


InnoDB obviously stores indexes and data in a single file. InnoDB tables are also organized using B+tree data structures.

So let’s see how it’s stored.


Ibd stores data in a leaf node of a B+tree that contains the index and other columns in the row where the index resides.

The underlying lookup process:

Ibd file through B+tree to quickly locate the field, starting from the root node.

Return the index key and the other columns of the row where the record resides.


Clustered (clustered) indexes

Clustered indexes: Leaf nodes contain complete data records.

Simply put, the index and all the other columns in the row are grouped together.

Obviously, MyISAM does not have a clustered index, InnoDB does, and InnoDB’s primary key index is a natural clustered index.

Where there are clustered indexes, there are of course non-clustered indexes (sparse indexes). For MyISAM, its index is a non-clustered index. Because its indexes and data are stored in two separate files:.myi for indexes and.myd for data.

Why do DBAs always recommend primary keys in tables and integer increment?

Please evacuate: this is recommended, do not say certain. Must use UUID not to block you 😁

Why do we have primary keys?

Because InnoDB table data must have a B+tree index structure to organize and maintain all the data of our entire table, thus forming.idb files.

What does that have to do with a primary key?

If InnoDB creates a table with no primary key, that table may not have any indexes, MySQL will select all create aggregation indexes that are unique and not null for the first field.

If a field does not have a unique index, an implicit field becomes the table’s clustered index: InnoDB creates a six-byte integer column ROW_ID for us, which increases monotonically as new rows are inserted, and InnoDB aggregates the data in this column.

Tables that use this ROW_ID column all share the same global sequence counter (which is part of the data dictionary). To avoid running out of ROW_ID, it is recommended that a separate primary key field be created in the table.

Why is integer increment recommended?

First, an integer takes up less space than a string and is faster to find than a size than a string. The size of the string must be converted to ASCII before comparison.

If you use autoincrement, you’ll also be more efficient at inserting.

Without auto-increment, it is possible to insert elements in the middle of a B+tree from time to time. When the node position is full, the node has to be split (inefficient) and maintained, and the tree may have to be balanced, another performance consuming operation.

If you increment everything, you’ll always insert elements backwards, so the probability of index nodes splitting is much lower.

Secondary indexes

All indexes except clustered indexes are called secondary indexes, also known as secondary indexes.

Its leaf nodes don’t store all the other columns, just the primary key.


The underlying lookup process:

Each time we look for data, we find the primary key of the corresponding leaf node based on it, and then we look it up in the B+tree of clustered indexes to get the whole record.


Advantages: Consistency and space saving.

The resources

  1. https://blog.jcole.us/innodb/

The last

Today WE talked about how MySQL data is stored in our computer, from the different storage engines to the clustered index and secondary index and how they find data.

We’ll talk about the most used syndication index in our work, and how it optimizes the left-most prefix. Stay tuned!

If this article helped you, please like 👍.

Share technology, hold on, we can win 💪!