This is a bonus, as an apology for taking so long.

Friends, I am back again. It should have been a month since my last post. During this period, THERE were a lot of personal trivia, which caused my article could not be updated normally.

Starting last week, I started reading a little book I bought long ago in the nuggets: Understanding how Mysql Works from the roots, which has been eating the dust in my nuggets on a half-buy basis.

Until recently, I’ve seen the term MVCC popping up on my timeline in some Java communication groups, and I know it’s a cue to do a good job of brushing up on Mysql.

So I spent a week’s time to get this small book from the beginning to the end of good again, some sections may see again and again for four or five times, think of this little book knowledge I understand a third, really very ashamed, was less than half, because the claims from the root to understand brochure involves a lot of the concepts of Mysql, it’s really hard to all sorted out within a week.

On the whole, is a good book full of dry goods, from the content can also see the author’s foundation, worth two brush three brush. (This book seems to be ready to be published at the mechanic, it has been half a year, it will be within a month ~)

I am not here to write soft articles, nor to recommend, but purely to share the feelings of reading this book recently. This book is very suitable for students who know CRUD but do not know Mysq very well.

Just to give you a quick map of what the book is about:

I have divided all the content of the brochure into seven parts, and I will give my views on each part of the map.

InnoDB

If you look at the table of contents of this small volume, you can see that the first four chapters are all about basic things like starting mysql-server, using mysql-client to connect and changing character sets, so I haven’t marked this part on the map, just a general understanding.

After these basics, I started the first hard dish in the booklet: InnoDB principle of inquiry.

InnoDB is the most commonly used Mysql storage engine, and the author has written the most about it, because it is the only one that supports transactions and locks, so a lot of the content in this section echoes the rest of the section.

The author starts with a note that tells us how InnoDB manages data internally:

A database table record is stored in a format called record format. Take 5.7 as an example to list the different structure and characteristics of four row formats in MySQL.

After that, where is the record format? The record format is placed on a data structure called a page, and there are many different pages in MySQL for different purposes.

Each page is 16KB in size. MySQL reads or persists data on a page basis, which means at least 16KB of data is read or persisted at one time.

Every time data needs to be written, the memory will be applied to the corresponding data page, until the next page is filled, if the data is deleted, it will be marked as delete, but the data will not be removed, but the deleted data will form a garbage list, waiting for new data to overwrite it.

So the page said, the page is subordinate to who? This introduces the concept of table Spaces.

Each table corresponds to a tablespace file (table name:.ibd). This tablespace file is the actual file that stores data. Data records are managed by pages, which are managed by the tablespace, which is a hierarchical dependency relationship.

However, because the page span can be too large (a table space corresponds to all table data, but a page is only 16KB, data is slightly larger, a table space corresponds to tens of thousands or even hundreds of thousands of pages), so there are layers in the table space: extones and groups.

64 pages correspond to an extent (64*16=1MB), 256 extents correspond to a group (256MB), so the logical dependency relationship becomes: record -> data page -> extent -> group -> tablespace.

In order to speed up loading, the cache used by almost all systems is introduced – the cache pool.

Was greatly enhanced by the introduction of the buffer pool data page of the query efficiency, after all, Mrs Disk I/o is too slow, buffer pool will be applied to a piece of memory as the data in MySQL startup page buffer pool (database optimization can talk about the idea), at the same time, according to the algorithm to better control the data in the buffer pool (most often visit ah of what, this quite complicated).

So that’s basically the end of InnoDB fundamentals. The most complicated part of InnoDB is table Spaces, because it involves things like indexes. Table Spaces are the most difficult part of the whole book, and THAT’s what I think in the comments section of the book.

The index

When I read the small volume, I had the idea of reading the index, but I didn’t realize that the length of the index is very short, only two chapters: the principle of indexing and the use of indexing.

In fact, when I read a small volume, I generally look at the contrast, read the author wrote and then go to the Internet to see others write, both of them together to understand.

Index this, too, but it’s pretty easy to understand, written by the author index is conducted on the basis of good record primary key sorting, due to the primary key is ordered with a primary key so as a B + tree nodes, the B + tree leaf node is a record of all, not a leaf node is a kind of things, by the name of a directory tree can be understood as the leaf node grouping, Because it is a B+ tree, using binary search can quickly find the corresponding node.

The non-leaf index is called a clustered index, which is automatically generated in InnoDB and takes up a lot of space because the leaf node is all the records.

We manually generated index called secondary index/auxiliary index, the index is generated after will also constitute a B + tree, but this leaves node does not record the data, but the records corresponding to the primary key of rows of data when you use after obtaining the primary key secondary indexes, but also with the main key to search a data, this process is called back to the table.

If a secondary index is created with multiple columns specified, it is called an override index.

In addition to this, the author also mentioned that MyISAM indexes are secondary indexes without clustering, but secondary indexes are a little different from InnoDB indexes. MyISAM indexes record row numbers instead of primary keys, so you can directly find the corresponding data by the row numbers, so you don’t have to go back to the table.


Once you understand the index, it is much easier to look at the precautions to use the index. For example:

  1. Keep the types of index columns as small as possible (to reduce the space required to create the index)
  2. Delete duplicate and redundant indexes from table (index column uses only one index)
  3. Database primary key increment (to avoid primary key disorder, which frequently disrupts the nodes of the B+ tree because the B+ tree relies on primary key sorting)
  4. When you create an overwrite index, you need to order the fields in the same order as when you create a joint index.
  5. Fuzzy matching specifies that the left prefix can be sorted using an index (since the index values are sorted from left to right, you can also use the index to narrow the range if the left value is specified).

There are a lot of other caveats, but this site basically says enough, so I won’t go over them here.

Single table access mode

The primary key is very fast. This is called const, which means constant level query. The complexity can be ignored.

These are things that you know, because it’s kind of a rule, and this part is a precursor to Explain, and Explain is going to use this knowledge.

Explain

Explain knowledge is also very clear in the online blog, because it is all some conceptual knowledge, I used to learn MySQL when zhou Yang’s class, there is also Explain related knowledge.

It is mainly a query statement lists have gone which indexes, affect the number of how many rows, personal feel actually limited effectiveness, because an SQL to actuators to perform, to optimize the optimizer, optimized SQL may have and you write is not the same as the original, need to understand the optimizer can understand this optimization process.

In this case, you can use optimizer_Trace to see the execution of the SQL to see what the optimizer has done.

Show status like ‘%last_query_cost%’

This statement can query the cost of the previous query statement. When the SQL optimizer optimizes a statement, it tries to calculate the cost of different schemes, and finally uses the optimization scheme with the least cost. I think this way can compare the performance gap between the two SQL statements written by myself intuitively.

Subquery optimization

In the optimizer optimization section, the volume mainly tells the optimizer to optimize some schemes, but the main focus is on sub-query optimization schemes.

The transaction

InnoDB basics, indexes, transactions, and locks are the four most important things to understand before reading this little book.

There are a lot of concepts in transactions, but the author will focus on what scheme MySQL uses to ensure transactions and transaction rollback.

MySQL uses the redo log to record statements during transaction execution and the undo log to record values before statements are executed. Each transaction has a global ID.

The redo log is used to restore the system to the original redo log when the system is restarted. The undo log is used to restore the redo log when the system needs to be rolled back.

MVCC(Mutil-Version Concurrency Control) is a solution for concurrent reads and writes in MySQL.

In highly concurrent programs, the database tends to start multiple transactions at the same time and execute them alternately, which may cause transaction concurrency problems: dirty write, dirty read, non-repeatable read, and magic read.

Here’s a review of these four concepts:

  1. Dirty write: a transaction modifies data modified by another uncommitted transaction
  2. Dirty read: a transaction reads uncommitted data from another transaction
  3. Unrepeatable read: a transaction can read the value that other transactions commit after this transaction is started
  4. Phantom read: When two queries with the same query conditions are performed in a transaction, more data is read in the last one than in the previous one

All four of these problems can be solved by locking, but the performance will inevitably be degraded by doing so, so we choose to do this in MySQL:

Dirty write is usually done by locking, which means that only one transaction can write to one record at a point in time to avoid dirty write.

The mechanism is that the undo log maintains a version chain. Each version of the version chain has a transaction ID to identify which transaction changed the version. When querying the current transaction, the current transaction ID will be calculated to see which records on the version chain, so as to avoid dirty reads, not repeated reads and magic reads.

When some online reading MVCC said this was the version chain + optimistic locking, I turn to the Internet after watching the small volume of MySQL about this source, and small volume of about the same as above, is calculated (or judgment) to determine which version can see, optimistic locking is compared commonly new value and old value, actual condition and optimistic locking is not very much alike.

After the transaction piece, found that the database often asked MVCC is not so mysterious, can be said to be a word can say clear things. Then IT occurred to me that SEata, a distributed transaction AT model, is implemented in the same way as MySQL.

The lock

With bedding, in front of the lock, this chapter will be simpler to small copies but to lock this chapter simply introduced the classification of the lock, that is usual MySQL in which kind of lock with lock usually used, what are the features respectively, and did not go further to analyze two transactions and modify a record lock is what to do.

But it is not that the author did not write, but write in the public number,,, after the public number can go to see there are three locked combat analysis of the article.

Locks can be divided into two dimensions: granularity and characteristics. (My own interpretation, of course)

  • Particle size: Can be divided into row level and table level according to the scope of influence.
  • features: According to the characteristics of locks, the locks can be exclusive or shared.

There are generally four types of the two dimensions combined: row-level exclusive lock, row-level shared lock, table-level exclusive lock, and table-level shared lock.

In addition to the specific implementation of these locks, I also introduced the structure of locking in memory. Although I said earlier that this chapter was important, I didn’t go too far into it myself because it was all conceptual memorization.

In fact, this chapter combined with the public number of the lock actual combat analysis article is better, this chapter is like the first knowledge of the lock actual combat, otherwise you directly go to see the lock actual combat, all kinds of concepts are afraid to make a big head.

summary

This article wrote me very embarrassed, because there are many want to write more knowledge and can’t write (maximum by 30% content knowledge comes from small volumes), so I had meant to be logically this small volumes of content, detailed to tell you about but I have to go (tort, plagiarism), end up feeling a bit anticlimactic (highly recommended check my map).

As high frequency of some knowledge, however, I still wrote, such as the principle of the index and transaction (although a bit simplified), most development should be the intellectual depth, if we can have a good memory on the surface of the item contents, the interview can actually went deep said, can put a little deep than you said a pile of accessible knowledge to useful.

In general, this brochure is more recommended, is a deep digging principle of the book, some knowledge points such as bin log this partial application layer is not mentioned, the other are basically very detailed, with a brochure promo code should be less than 20 dollars, cost-effective.

At the beginning of the article, I also said that there is a little benefit in this article. I will give you three pamphlet with a 60% discount code (non-sponsorship, all of which are saved from the essay soliciting activities I participated in before). In view of the lessons learned from the last lottery (because there was no private message, I could not contact the winner for a long time), I plan to transfer to the public number for the lottery this time.

  • Just follow my namesake official account (And earsReply),Lucky drawWe have our own draw link, 12 noon on October 1st.
  • And my MySQL mind map file, replyMySQLKeyword to get a brain map.

Well, today’s article to this first, welcome everyone to support, I am and ears, knowledge output, grow together.