Those of you who have studied server-side development know that programs can run without database indexes. But all learning information database, tutorials, there must be a lot of space in the introduction of database index, various back-end development index of the interview must also not open around, or even from the backend database index primary development across to the advanced development of slew a nod, then index exactly what role in server-side programs?

This article is the first in a series of four database indexing articles:

  1. What is a database index? Xinhua Dictionary to help you understand

  2. Database index penetration – in-depth

  3. 20 minutes database index design practice – practice

  4. Why is database indexing implemented with B+ tree? –

This series covers a series of knowledge of database index from theory to practice, one-stop solution from understanding to understanding of the whole process, I believe that each article can bring you more in-depth experience.

What is a database index?

In a word: database index is a key technology to speed up massive data query. Don’t understand this sentence by now? Never mind, read on and you’ll be able to make that conclusion yourself in 20 minutes.

Let me show you a picture first

Everyone must be familiar with this book, the first lesson in primary school must be to teach children how to use this book. What does this have to do with our database index? Don’t worry, let’s turn to page one.

Please pay attention to the upper right corner of that row of words, the original directory is the legendary index! As we can see from the previous “one sentence description”, the purpose of indexes is to speed up data query. So we look up the dictionary when the first place is where, I believe that most people will first turn to pinyin directory, after all, now many people are writing to forget the word 😂.

The function of database index and pinyin directory is the same, is the fastest to lock the location of the target data range. For example, if we want to check the word here, after we find part Xx, we can find the page number of the pinyin xian in sequence. According to the page number before and after, we can know that the word must be between page 519 and page 523, and the scope is narrowed to only 4 pages. That’s a lot faster than going from top to bottom, and that’s where the first technical term came into being — full table scanning, which is what we call going from top to bottom.

Sure enough, we found the word “risk” we were looking for on page 521.

So now we know what a database index is: a database index is a technology like a directory used to speed up data queries.

What is a federated index?

You’ve all seen database indexes with multiple fields, such as INDEX IDx_test (COL_A, col_B). An index that contains multiple fields is called a ** “joint index” **. So what does indexing on multiple fields do? Here is xinhua dictionary as an example, to see what is a joint index.

Xinhua Dictionary also has a catalogue called “radical catalogue”. As you can see below, to use this catalogue, we first find the correct part according to the number of strokes of the radical, and then we can find the radical we want to find in it. For example, if we still need to find the location of the danger word:

After the radical is found, the page on the right is not the real page number of the danger word. We also need to find the corresponding radical in the check table according to the page number on the right. After finding the checkbox on page 93, we can find the actual page number of danger in the section “6-8” based on the number of strokes left (7 strokes).

In this process, we use “two directories” in order, one called “radical directory” and one called “checklist”. And we can see that the contents of the checklists in the figure above are organized into radical categories. Taken together, these two parts are the subject of this section – federated indexes. That is, the value of the first field (radical) is used to find the corresponding second-level index position in the first-level index (check table page number), and then the value of the second field (stroke) is used to find the location of the eligible data (the real page number of the risk word) in the second-level index.

The leftmost prefix matches

As you can see from the previous example of using a radical directory, if we don’t know what the radical of a word is, it’s almost impossible to use the directory. This shows that you cannot use the radical directory just by the number of strokes (second field).

This leads to a rule for joint indexing: the index on a field in a joint index can only be used if all the fields (radicals) to the left of a field (stroke) are used. For example, the INDEX idx_i1(COL_A, col_B) cannot be used if the query condition is WHERE COL_B = 1.

But if we know the radical but don’t know the number of strokes, for example, we don’t know whether “horizontal, vertical, cursive” counts as one stroke or two strokes, then we can still use the contents of the “radical table” section, but we just need to look at all the words in the “check list” corresponding to the radical to find the word we are looking for.

This leads to another rule of joint indexing: even if the other fields (strokes) to the right of a field (radical) are not used in a joint index, all fields before and after that field (radical) can still be indexed. For example, if the INDEX idx_i2(COL_A, COL_B, col_C) exists, the query conditions col_A = 1 and COL_B = 2 can still be indexed on col_A and COL_B.

However, if we do not know whether a word is two-character or three-character after determining the radical, in this case we only need to look in the two-character and three-character parts of the corresponding radical, that is to say, we still use the contents of the check list. Therefore, indexes can also be used when using range criteria queries.

Finally, we can fully express the meaning of the leftmost prefix matching principle: for an index of joint, if there is need to execute an SQL query statements, only from the index on the left side of the first field to the SQL statements in the query conditions, a field that does not contain fields (excluding) or range condition (including) part will use the index to accelerate so far.

A point that has not been mentioned before is that the range condition field also ends the use of subsequent fields on the index. Why? The explanation of the specific reasons involves a deeper level of knowledge, which can be found at the end of the next second article.

What is a clustered index?

It is possible to have multiple indexes on a table in a database from the fact that both the radical and pinyin directories exist at the same time but there is only one copy of the actual dictionary content. So what are the differences between different indexes?

We can see A V-shaped black square on the side of xinhua Dictionary, many people will write A, B, C, D on the side of the corresponding pinyin letters. Because all the words in the dictionary are arranged according to the pinyin order, sometimes it is quick to use the first letter to open the corresponding part.

An index such as a pinyin directory, where the data is sorted and organized according to the order in the index, is called a clustered index, whereas a non-clustered index is any other general index. Because data can only be sorted according to one rule, a table can have at most one clustered index, but can have multiple non-clustered indexes.

In the InnoDB storage engine of MySQL database, the primary key index is a clustered index. All data is organized according to the primary key index. In the MyISAM storage engine, there is no clustered index because the data in the MyISAM storage engine is not stored in index order.