Many of you are familiar with the MySQL Index, which is a data structure that helps MySQL obtain data efficiently.

Because index is the focus of MySQL knowledge, I believe that many people have a certain understanding, especially in the interview frequency is particularly high. I think I have a lot of knowledge about MySQL indexes, and I have reviewed a lot of knowledge about indexes because I am currently in a job interview.

I didn’t realize, however, that MY knowledge of indexing was only elementary school level until I was abused by ali’s interviewers.

The following is a summary of the index questions and knowledge points I had in an interview with Ali.

Index concept, index model How did we talk about index, because I mentioned that our business volume is relatively large, about millions of new data is generated every day, so we had the following dialogue:

Interviewer: Do you keep such a large amount of data every day in a relational database?

Me: Yes, we use MySQL database online

Interviewer: Millions of data a day, tens of millions a month. Have you optimized your queries?

Me: We created some indexes in the database (I regret saying that now). It can be seen here that the interviewers of Ali will not ask questions one by one like some companies, but will expand according to what the interviewees have done and some content in the interview process.

Interviewer: Can you tell me what an index is?

An index is actually a data structure that allows us to quickly retrieve data from a database.

Interviewer: What kind of data structure does the index use?

There are two types of MySQL structure: Hash index and B+ Tree index. We use InnoDB engine. The default is B+ Tree.

I played a trick here by saying that the index is related to the storage engine. I hope the interviewer can ask me some questions about storage engines.

Interviewer: Since you mentioned InnoDB uses B+ Tree index model, do you know why it uses B+ Tree? How does this compare to Hash indexing?

Because the underlying Hash index is a Hash table, which is a key-value storage structure, the storage relationship of multiple data is completely unrelated to any order. Therefore, for interval queries that cannot be queried directly through the index, a full table scan is required. Therefore, hash indexes are only suitable for equivalent query scenarios. B+ Tree is a multi-way balanced query Tree, so its nodes are naturally ordered (the left child node is smaller than the parent node, and the parent node is smaller than the right child node), so there is no need to perform full table scan for range query.

Interviewer: In addition to the above range of queries, can you name any other differences?

I :(this question I answer is not good, baidu after the event)

B+ Tree index is a Hash index

Hash index for the equivalent query, but not impossible to range queries The hash index can’t use indexes sorted Hash index does not support multiple columns combined index of the left matching rules If there is a large number of duplicate keys to case, the efficiency of the hash index is low, hash collision problems because of the cluster index, covering the index

Interviewer: Just now we talked about B+ Tree. Do you know what can be stored in the leaf nodes of B+ Tree?

Me: InnoDB’s B+ Tree may store whole rows of data, or may store primary key values.

Interviewer: What’s the difference between the two?

I :(when he asked me about leaf nodes, I thought he might ask me about clustered index and non-clustered index) in InnoDB, the leaf node of index B+Tree stores the whole row of data is the primary key index, also known as clustered index. The leaf node of index B+ Tree stores the primary key value of the non-primary key index, also known as the non-clustered index.

Interviewer: So, is there a difference between a clustered index and a non-clustered index when querying data?

Me: Clustering index queries faster?

Interviewer: Why?

Me: Because the leaf node of the primary key index tree is directly the whole row of data we want to query. The leaf node that is not the primary key index is the value of the primary key. After finding the value of the primary key, you need to use the value of the primary key to query again.

Interviewer: You mentioned that primary key index queries are checked only once, while non-primary key indexes need to be queried multiple times.

(This process, I later learned, is called a callback.) Is this true in all cases? Must non-primary key indexes be queried multiple times?

Me :(well, I did not answer this question well, and later I checked the data myself to know that by overwriting the index can also be queried only once)

Popular science time – coverage index

A covering index means that the execution of a query can be obtained only from the index, not from the data table. You can also call it index coverage. When a query statement meets the conditions of overwriting an index, MySQL can return the data required by the query only through the index. In this way, the operation of returning to the table after the index is queried is avoided, which reduces I/O and improves efficiency. For example, the table covering_index_SAMPLE has a common index IDx_KEY1_KEY2 (key1,key2). Select key2 from covering_index_SAMPLE where key1 = ‘keytest’; Can be queried by overwriting the index without returning to the table.

Union index, left-most prefix match

Interviewer: It doesn’t matter if you don’t know. What factors do you take into account when creating an index?

Me: We generally for the query probability is relatively high, often as where condition field set index

Interviewer: Have you used a joint index?

Me: Yes, we have created federated indexes for some tables.

Interviewer: So when you create a joint index, you need to do a joint index between multiple fields and how do you choose the order?

Me: Let’s put the most recognizable fields first.

Interviewer: Why do you do that?

Me :(this question a little bit put me to ask confused, a little bit flustered) so that the hit rate may be a little higher…

Interviewer: Do you know if the leftmost prefix matches?

Me :(I suddenly remembered what the interviewer had wanted to ask and wondered why I hadn’t thought of it before.) Oh oh oh. When creating a multi-column index, we place the most frequently used column in the WHERE clause on the left because MySQL index queries follow the principle of left-most prefix matching, i.e. left-most first, starting from the left-most index when retrieving data. So when we create a federated index, such as (key1,key2,key3), we create (key1), (key1,key2), and (key1,key2,key3). This is the leftmost matching principle.

Although I was a little confused at first and didn’t think of the left-most prefix match, the interviewer guided me. Very friendly. Index push-down, query optimization

Interviewer: Which version of MySQL are you using online?

Me: Our MySQL is 5.7

Interviewer: Do you know what changes have been made to indexes in MySQL 5.6?

Me: sorry, this I did not go to understand.

(I looked it up later, and here’s an important one: Index Condition Pushdown Optimization MySQL 5.6 introduces Index Pushdown Optimization, which is enabled by default. SET optimizer_switch = ‘index_condition_pushdown=off’; You can turn it off. The people table (zipcode, lastname, firstname) forms an index

SELECT * FROM people WHERE zipcode= ‘95054’ AND lastname LIKE ‘%etrunia%’ AND address LIKE ‘%Main Street%’;

If index push-down is not used, MySQL queries the storage engine with zipcode=’95054 ‘and returns the data to the MySQL server. The MySQL server then determines whether the data meets the criteria based on lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’. If index push-down is used, MYSQL will first return an index that matches zipcode=’95054 ‘and then determine if the index is eligible based on lastName LIKE ‘%etrunia%’ and address LIKE ‘%Main Street%’. If yes, the data is located based on the index. If no, reject the data. With index push-down optimization, you can reduce the number of times back to the table in the case of like conditional queries.

Interviewer: Do all the indexes you create work, or do your SQL statements use indexes to query?

Me: This has not been counted, unless we encounter slow SQL will go to the investigation

Interviewer: Is there any way to know if there is an index query?

Me: I can view the execution plan of SQL statements through Explain, and analyze the index usage through the execution plan

Interviewer: What happens when you create an index, but when you execute it it doesn’t pass the index?

Me :(vaguely remembered about the optimizer, but didn’t answer the question well)

Popular Science time – query optimizer

An SQL statement query can have different execution schemes. As for which scheme to choose, the optimizer needs to select the scheme with the lowest execution cost. Before a single table query statement is actually executed, MySQL’s query optimizer finds all possible alternatives for executing the statement and compares them to find the least costly alternative. This lowest-cost option is known as the execution plan. The optimization process is as follows: 1. Find out all possible indexes according to the search criteria; 2. Calculate the cost of full table scan; 3

Interviewer: Well, that’s all we need to know about indexes for now. What is the transaction isolation level for your online data?

I :(after the question about transaction isolation level, I will not expand) I feel that because I did not answer these index questions well, he will ask more questions, I am afraid will be abused even more

Above, is an interview about the index part of the knowledge of the question and I sorted out the answer. During the interview, I felt that I could answer about 70% of the index knowledge, but only 50% of the content was correct. It seemed that I didn’t know enough about index knowledge.

Through this interview, I found that big manufacturers like Ali still pay more attention to the basic knowledge. I thought that the most I could ask about indexes was the difference between Hash and B+, but I did not expect to be able to ask the query optimizer in the end.

Finally, no matter whether this interview can be passed or not, I am very grateful for this opportunity to let myself see their shortcomings. Through this interview, I also gained a lot of things. Come on!