“Database Index, Finally understand” introduces why B+ tree is suitable for database Index, database Index is divided into Primary key Index (Primary Index) and Secondary Index (Secondary Index). How InnoDB and MyISAM use B+ trees to implement these two types of indexes, and what is the difference?
Question 1: What is the index structure of MyISAM?
MyISAM’s indexes are stored separately from row records, called UnClustered indexes.
The primary key index is essentially the same as a normal index:
(1) There is a continuous aggregation area to store row records separately;
(2) the leaf node of the primary key index, which stores the primary key and the pointer to the corresponding row record;
(3) the leaf node of the common index, which stores the index column and the pointer of the corresponding row record;
Voice-over: MyISAM tables can have no primary key.
A primary key index and a normal index are two independent index B+ trees. When searching through the index column, the leaf node of the B+ tree is located first, and then the row record is located through the pointer.
For example, MyISAM:
t(id PK, name KEY, sex, flag);
There are four entries in the table:
1, shenjian, m, A
3, zhangsan, m, A
5, lisi, m, A
9, wangwu, f, B
Its B+ tree index structure is shown in the figure above:
(1) Row records are stored separately;
(2) id is PK, there is an index tree of ID, leaves point to row records;
(3) Name is KEY, there is an index tree of name, and the leaf also points to the row record;
Question 2: What is InnoDB’s index structure?
InnoDB primary key indexes are stored together with row records, so they are called Clustered indexes:
(1) There is no separate area to store row records;
(2) the leaf node of the primary key index, which stores the primary key and corresponding row records (instead of Pointers);
Vo: So InnoDB PK queries are very fast.
Because of this feature, InnoDB tables must have clustered indexes:
(1) If the table defines PK, PK is the clustered index;
(2) if the table does not define PK, then the first non-empty unique column is the clustered index;
Otherwise, InnoDB creates a hidden row-ID as the clustered index; \
There can only be one clustered index, because row data can only have one clustered store on physical disk.
InnoDB can have multiple normal indexes, which are different from clustered indexes:
(1) leaf node of normal index, store primary key (also not pointer);
Question 3: Why InnoDB recommends using trend increasing primary keys? \
InnoDB uses trend increasing primary keys to insert records without splitting the index or moving a large number of rows.
Q4: Why shouldn’t InnoDB use long primary keys?
Assume that there is a user-centric scenario, including id number, ID MD5, name, date of birth and other business attributes, these attributes have query requirements, and transaction requirements, must use InnoDB storage engine. At this point, how do you design the data table?
The easiest way to think about design is:
(1) ID card as the primary key;
(2) Index building on other attributes;
user( id_code PK ,\
id_md5(index),
name(index),
birthday(index));
The index tree and row record structure are as follows:
(1) ID_code aggregates indexes and associates row records;
(2) Other indexes, storing id_code attribute values;
Id number ID_code is a relatively long string, each index stores this value, in the case of large amount of data, memory precious, MySQL limited buffer, storage index and data will be reduced, disk I/O probability will increase. \
Voice-over: At the same time, the disk space taken up by the index also increases.
In this case, an id increment column with no service meaning should be added:
(1) Cluster index with id increment column, associated row records;
(2) Other indexes, storing ID values;
user( id PK auto inc ,
id_code(index),
id_md5(index),
name(index),
birthday(index));
As a result, the limited buffer can buffer more index and row data, reducing the frequency of disk I/OS and increasing overall performance.
Why InnoDB should not use long columns as primary keys?
Q5: InnoDB’s normal index stores primary key values. What might be the problem?
Backtable queries may occur when using plain index queries.
What is a back table query?
Again, the above example:
t(id PK, name KEY, sex, flag);
Voice-over: ID is the clustered index, and name is the plain index.
There are four entries in the table:
1, shenjian, m, A
3, zhangsan, m, A
5, lisi, m, A
9, wangwu, f, B
The two B+ tree indexes are shown in the figure above:
(1) Id is PK, clustered index, leaf node stores row records;
(2) Name is KEY, a common index, and the leaf node stores THE PK value, that is, ID;
Since row records cannot be located directly from a normal index, what is the query process for a normal index?
Typically, you need to scan the index tree twice.
Such as:
select id,name,sex from t where name=’lisi’;
How is it implemented?
If the path is pink, it needs to scan the index tree twice:
(1) First locate the primary key value id=5 through the common index;
(2) locate the row record by clustering index;
This is known as a back-table query, which locates the primary key first and then the row record, and has lower performance than a sweep of the index tree.
Question 6: How do I optimize back table queries? \
A common solution is to override indexes.
What is an index Covering?
This concept is not found on the MySQL website.
Are you serious about your studies?
Borrow sqL-server official website statement.
If the Extra field of the output result of EXPLAIN is Using index, index overwriting can be triggered. \
You can obtain all columns of SQL data in a single index tree without returning to the table.
How to implement index coverage?
The common method is to create a joint index for the field to be queried.
For query requirements \
select id,name,sex from t where name=’lisi’; \
A single column index (name) can be upgraded to a federated index (name, sex) to avoid back tables. Voiceover: The attribute sex is no longer queried in the clustered index.
conclusion
MyISAM and InnoDB both use B+ trees for indexing:
(1) MyISAM index and data are stored separately;
(2) The index leaf node of MyISAM stores Pointers, and the primary key index is not much different from the common index;
(3) Unified storage of InnoDB’s aggregated index and row data;
(4) InnoDB’s clustered index stores the data row itself, while normal index stores the primary key;
(5) InnoDB should not use long columns as PK;
(6) InnoDB common index may have back table query, the common solution is to overwrite index;
The Architect’s Path – Share practical architecture articles
Related recommendation: \
Why InnoDB concurrency is so high?
“Database index, finally understood” \
Homework: \
MyISAM or InnoDB for frequent insertion scenarios?
Train of thought is more important than conclusion, hope you have harvest, thank turn.