All my articles are here, this article GitHub github.com/JavaFamily has been included, there is a line of large factory interview complete test site, the end of the article has welfare.
preface
MySQL, Oracle, indexes, stored procedures, query optimization, etc.
I don’t know if you think the same as me, I want to write index, why?
Here is an interview scenario that I don’t know if you are familiar with:
Interviewer: The database has tens of millions of data, and the query is very slow. What should we do?
Interviewer: Index.
Interviewer: What are the data types of the index? What is the structure of an index? Which fields are suitable for indexing? The advantages of B+? What is the difference between an aggregated index and a non-aggregated index? Why do indexes slow down maintenance tasks such as inserts, deletions, and modifications? … .
Interviewer: How did the interviewer get out of our company 😂?
Yes, you probably know why the index is added slowly, what fields are added, and the data structure characteristics of the index, advantages and so on are obscure or even unknown.
So let’s cut to the BB and get right to the interview.
The body of the
I see your resume says you are familiar with MySQL databases and indexes. Let’s start with indexes. What are the data structures of indexes?
Hash, B +
When you design an index, you will find that the index type is optional.
Why hash tables, fully balanced binary trees, B trees, B+ trees can optimize queries, why Mysql only likes B+ trees?
Let me talk about Hash:
You can take a look at the GIF below
Note that the array subscripts corresponding to the field values are randomly calculated by the hash algorithm, so hash conflicts may occur.
So for such an index structure, now execute the following SQL statement:
Select * from sanguo where name=’ right ‘
Select * from ‘egg’; select * from ‘egg’; select * from ‘egg’;
Select * from sanguo where name> 0
Can’t help, because the characteristics of hash table can be fast and accurate query, but does not support range query.
If made index, that speed is also very slow, to scan all.
As an aside, which scenarios are Hash tables suitable for?
Equivalent query scenarios only include KV (Key, Value), such as Redis, Memcached and other NoSQL middleware.
You’re talking about unordered Hash tables, but is there an ordered data structure?
Ordered arrays, it’s Nice, it’s Nice for equivalence and range queries.
Does it have no drawbacks at all?
No, ordered data is good for static data, because if we add, delete, or modify data, we change its structure.
For example, if you add a new node, all the nodes will move back after your new location, which is very expensive.
Well, he’s not good at all, and there’s no room for that.
It can be used to do static storage engine ah, used to save static data, such as your 2019 Alipay bill, 2019 Taobao shopping record and so on are very appropriate, are not changing historical data.
There’s something there, young man. What about binary trees?
The new and structure of binary tree are shown as follows:
Binary tree structure I will not be here more BB, do not know the friends can go to see the data structure chapter.
Binary trees are ordered, so range queries are supported.
But his time is order log of N, and in order to maintain that, the update has to be order log of N, so it has to be a perfectly balanced binary tree.
Why did you say that balanced binary trees are good for indexing?
The index is not only stored in memory, but also persisted in disk. As you can see from the graph, if there is more data, the tree height will be high, and the query cost will increase as the tree height increases.
In order to save the cost of a lot of companies or the use of mechanical hard disk, such a ten million level of inquiry about 10 seconds, this who can stand ah?
What if I use B trees?
Similarly, let’s look at the structure of B tree:
It can be found that the representation of the same element in a B tree is “shorter” than a fully balanced binary tree because a node in a B tree can store multiple elements.
B tree is already a good data structure, used for indexing is good.
So why do we use B+ trees instead of B trees?
Again, let’s look at the structure of B plus:
We can find that the representation of the same element in B+ tree is “fatter” than that in B+ tree, because there is a redundant copy of non-leaf nodes in B+ tree in leaf nodes, and leaf nodes are connected by Pointers.
So what’s so good about B+ trees?
If you look at the data structure above, the original Hash does not support range query, and the binary tree is very tall, only B tree is equal to B+.
A node of a B tree can store multiple elements. Compared with a fully balanced binary tree, the tree height is reduced and disk I/O efficiency is improved.
The B+ tree is an upgraded version of the B tree, but the non-leaf nodes are redundant. The advantage of doing so is to improve the efficiency of range search.
The reason for the increase is that there is a pointer to the leaf node of the next node.
Summary: Mysql uses B+ tree as an index structure to improve disk I/O efficiency and range query efficiency, and the elements in B+ tree are also ordered.
So, do you have any idea how many elements in a node of a B+ tree are appropriate?
Uh, this, this? I’m a little confused.
After a while or did not come up with, can only honestly explain: this is not very understand cough cough.
Another way you can think about it is how big is a node in a B+ tree?
A B+ tree where a node is a page or multiple of a page is most appropriate.
Why?
If the size of a node is less than 1 page, then reading this node will actually read 1 page, resulting in a waste of resources.
If the size of a node is greater than 1 page, such as 1.2 pages, two pages will be read when the node is read, resulting in a waste of resources.
Therefore, in order to avoid waste, it is most appropriate to control the size of a node in multiples of 1, 2, 3, 4 pages.
You mentioned the concept of pages, can you tell me briefly?
First, the basic storage structure of Mysql is a page (where records are stored) :
-
Each data page can form a bidirectional linked list
-
The records in each data page can form a one-way linked list
-
– Each data page generates a page directory for the records stored in it. When searching for a record through the primary key, dichotomy can be used to quickly locate the corresponding slot in the page directory, and then traverse the records in the corresponding group of the slot to quickly find the specified record
-
Use other columns (non-primary keys) as search criteria: Each record in a single-linked list can only be traversed starting with the smallest record.
So, if we write a SQL statement like select * from user where username=’ c ‘without any optimization, the default is:
-
Navigate to the page where the record resides
-
– Need to traverse the bidirectional list to find the page
-
Find the corresponding record from the page you are on
-
– Because the query is not based on the primary key, only the single linked list of the page can be traversed
Obviously, in the case of a large amount of data this lookup is very slow! It looks a little bit like a return watch.
Oh? Let me talk to you about it.
Oh, my God. Damn it. What did I say.
Select * from ‘ID’; select * from ‘name’; select * from ‘ID’;
Select * from table where name = ‘c’
Select * from primary key where id = 2; select * from primary key where id = 2;
Back to the primary key index tree search process, is back to the table. However, there is a way to avoid back tables, and that is to overwrite indexes.
Oh? Why don’t you talk to me again about overwriting the index?
!!!!!!!!! My mouth…
Select * from table_name, select * from table_name, select * from table_name, select * from table_name, select * from table_name.
Overwriting indexes can reduce the number of tree searches and improve performance, which is often used to optimize query efficiency in the actual development process.
Many syndication indexes are created to support overwriting indexes, which can greatly improve efficiency for specific businesses.
Do you know the leftmost matching rule for indexes?
Leftmost matching principle:
- An index can be as simple as one column (a) or as complex as multiple columns (A, B, C, D), known as a joint index.
- In the case of a joint index, the key is also composed of multiple columns, and the index can only be used to check whether the key exists (equal). In the case of a range query (>, <, between, like left match), the index cannot be further matched, and the subsequent degradation of linear search.
- Therefore, the order of the columns determines the number of columns that can be hit by the index.
Example:
- If there are indexes (A,b,c, and d) and the query conditions are a=1 and B =2 and C >3 and d=4, a, B, and C will be matched on each node in sequence, but D will not be matched. (c is already a range query, d is certainly not sorted)
conclusion
Index is a very important knowledge point in database!
The above is actually the most basic index things, N tree, jump table, LSM I did not cover, at the same time to create a good index to take into account a lot of aspects:
- Left-most prefix matching rule. This is a very important, very important, very important principle, MySQL will keep matching to the right until it encounters a range query (>,<,BETWEEN,LIKE) and stops matching.
- Select columns with high distinctness as indexes. The distinctness formula is COUNT(DISTINCT Col)/COUNT(*). Represents the rate at which fields are not duplicated. The higher the rate, the fewer records we scan.
- Index columns cannot participate in the calculation; try to keep the columns “clean”. For example, if FROM_UNIXTIME(create_time)=’2016-06-06′, the index cannot be used, because the B+ tree stores the values of the fields in the table, but requires the function to compare all the elements. Create_time =UNIX_TIMESTAMP(‘2016-06-06’)
- Expand the index as much as possible, do not create new indexes. For example, if you want to add (a,b) to a table that already has an index of A, you only need to modify the original index.
- A single multi-column composite index performs a different retrieval query than multiple single-column indexes because when SQL is executed,
MySQL can only use one index, the most restrictive index is selected from multiple single-column indexesMySQL5.0 and later, there is a “merge index” policy. The author of the book “High Performance MySQL 3” says:Rather than relying on a “merge index” strategy, you should build better indexes). - The “merge index” strategy simply means using multiple single-column indexes and combining the results with “union” or “and”
Literature reference:
MySQL In Action
High Performance MySQL
The last part is from -> Java3y Indexes and Locks
MySQL In Action
omg
I posted a video on station B:
Everyone’s feedback is ok. I will try more in the future, and I hope you can give me feedback on improvement suggestions.
I shot my first super rough vlog last year:
Because the shooting and editing techniques are rubbish, I deleted them, but recently I thought of putting them on, I am struggling haha, I want to leave a message, haha, we will next period.
Today, Bingbing also started to work for the first time after 16 days in Hangzhou. I am very happy that our company is among the first batch of people to resume work in Hangzhou. I have not talked to anyone like this for 16 days.
This familiar station, this familiar monitor, my corner of the eye and…
White piao is not good, creation is not easy, everyone’s praise is the biggest power of creation, we see the next article!
Continue to update, to be continued…
This article is updated every week, you can search “Santaizi Aobing” on wechat for the first time to read, reply [information] [interview] [resume] I prepared the interview materials and resume template, GitHub github.com/JavaFamily has included this article, there is a complete test site for the interview, welcome Star.
The more you know, the more you don’t know