Proficient in MySQL architecture

Old Liu is about to find a job of graduate students, self-taught big data development, along the way, feeling quite deep, online big data data of good and bad mixed, so want to write a detailed guide to big data development. This guide to the big data [basic knowledge] [framework analysis] [source understanding] are described in their own words, so that partners self-study from now on do not ask people.

Big Data Development Guide address is as follows:

github：https://github.com/BigDataLaoLiu/BigDataGuide

Yards cloud: https://gitee.com/BigDataLiu/BigDataGuide

Your point of praise is the power that I continue to renew, forbid white whoring, saw to be about to have harvest, have need to contact public number: hard old liu.

Today, what I share with you is the first article of MySQL, the basic part of big data development. Old Liu speaks something different from others. Many partners know the basic knowledge and use of MySQL, but the principle of the inside know not much, we learn knowledge only to look at the surface is absolutely not, so Lao Liu strive to explain the knowledge of MySQL architecture to everyone!

The MySQL architecture section Outlines the following:

After reading Lao Liu’s article, I hope you can grasp the following contents:

The components of Mysql and their functions
Mysql simple version execution process and detailed execution process
The differences between MyIsam and InnoDB and the usage scenarios are explained
Mysql log file concepts and related functions

First, logical architecture

Logical architecture diagram

First of all, share the logical architecture diagram of MySQL. We can see that MySQL is composed of many modules, and each module plays an important role. The following are the concepts and functions of each module.

The connector

Connectors, which refers to interacting with SQL in different languages.

System management and control tools

It is used for cluster backup and cluster management.

The connection pool

Managing connections, doing permission validation, that sort of thing.

SQL interface

After receiving SQL commands (such as DDL and DML), the system returns the query results. However, after receiving the SQL command, we need to turn it into meaningful SQL. In order to be recognized by the system, we need to parse the SQL statement, so we need the Parser.

The parser

Analysis is divided into lexical analysis and grammatical analysis, with examples to illustrate the lexical analysis and grammatical analysis.

After the SQL command is sent to the parser, it is verified and parsed by the parser. Lexical analysis is performed to form select, *, FROM, and T1 words. After parsing, a syntax tree is formed.

Query optimizer

After the previous step is syntactically correct, it will be passed to this section. Before the SQL statement is actually executed, MySQL will decide that your statement is not optimal and it will optimize it. The SQL statement execution plan that is viewed using Explain is generated by the query optimizer!

For example, select * from tuserwhere name like 'a%' and id = 1;

This sentence will be optimized, and we'll talk about why it's optimized, but just know, and it will become something like this.

     select * from tuser where name id = 1 and like 'a%';

Copy the code

The query cache

Stores the result of the query, but not for the SQL statement, but for the hashed value of the SQL statement. If the next time you have the same query result, you will not go to the Pluggable Shortage Engines storage engine and directly pull the query result out of the cache. (not useful now, removed in new MySQL, no longer used)

The storage engine

Pluggable storage engines, also known as MySQL databases, provide a variety of storage engines. It is used to store data, how to index the stored data and how to update it and so on.

In MySQL, there are two main storage engines: MyIsam and InnoDB.

MyIsam is a high-speed engine with high insertion and query speed. However, transactions and row locks are not supported.

InnoDB is the default MySQL database after version 5.5. It supports transaction and row-level locking, transaction processing, rollback, crash repair capabilities and multi-version concurrency control. It is slightly slower than MyIsam and supports foreign keys.

So how do we choose the storage engine type?

InnoDB: transaction support, foreign key support, crash repair capability and concurrency control. InnoDB is the preferred option for transactions with high integrity requirements (such as banks) and concurrency control (such as ticketing). If you need a database with frequent update and delete operations, you can also choose InnoDB because it supports transaction commit and rollback.

MyIsam: Fast data insertion, low space and memory usage. If the table is mainly used for inserting new records and reading records, then choosing MyIsam can be efficient for processing.

Here is a picture of the difference between MyIsam and InnoDB:

Simplified execution flow chart

How do you remember an execution flowchart? According to the logical architecture diagram module execution order to remember!

Client: sends commands to the connector, and the connector verifies the permission. After the permission verification succeeds, the client can continue to send SQL commands.
The connector: Establishes connections with clients and obtains permissions.
- If the username or password is incorrect, you will receive an error saying “Access denied for user”.
- If the username and password pass, the connector will go into the permissions table and look up the permissions you have.
Query cache: After the connection is established, you can execute the SELECT statement, which leads to the second step of the execution logic: query cache, and return the result if it has been cached before.
Parser: If there is no hit to the query cache, the actual execution of the statement begins, with lexical analysis followed by parsing.
Optimizer: After the parser, MySQL knows what you’re doing, and before you start executing, it needs to go through the optimizer. The optimizer determines which index to use when there are multiple indexes in a table.
Executor: The parser knows what you want to do, and the optimizer knows how to do it, so now go into the executor and start executing the statement. Note: Before you start executing, determine whether you have permission to execute the table. If you have permission to continue executing, return to the table if you have no permission. If permission is granted to open the table and continue execution, the executor will use the interface provided by the engine according to the definition of the table engine.

Detailed execution flow chart

Finish the short version of the implementation of the flow chart, feel almost ok. But at the beginning of learning, there is a detailed version of the executive flow chart, Old Liu also talk about the process.

After MySQL is started, the network interaction module will connect to the connection management module and so on. After the connection is up, it will enter the connection process module and then go to the user module to see if you have user permission. If the permission is passed, the information will be returned to the connection management module and you can log in.
The MySQL statement is then sent to the user module, which also checks if you have permission to manipulate the table. If you do, it goes to the command distributor, and then to the query cache module. If it has been checked before, it will return the result directly (at the same time, the command reaches the command distributor, and after the command comes down, it will go to the logging module first and record the log).
Then the command goes to the command parser to see what statement it is. Depending on the type of statement, it goes to the optimizer of different modules: query optimizer, table change module, table maintenance module, copy module, and state module.
SQL statements to access control module, now again did look at the permissions, see you have operation permissions (insert, update permissions), if there is no problem, the authority will enter into the table management module, called storage engine interface, and then after, storage engines take down the data is in file system (data), And back again.

So now that I’ve done my logic diagram, you’re going to have a good shot at getting your ideas and your logic straight, so you’ll remember it.

Second, physical structure

MySQL is physically divided into log files and data index files. In Linux, data index files and log files are stored in the /var/lib/mysql directory. Log files are stored in sequential I/O mode, while data files are stored in random I/O mode.

Here’s a question: why log files are stored in sequential IO and data files in random IO?

Sequential I/O and random I/O are introduced briefly. Sequential I/O is a piece of contiguous storage space physically. Sequential I/O is highly efficient when sequential content is appended. Random I/O is logically continuous, but physically not. When performing operations on content, you need to find the location of the file on the disk every time.
Lao Liu said simply, the advantage of sequential IO storage is fast record speed, data can only be added, this is particularly suitable for log files, because the characteristics of log files are also very obvious, log information, also do not need to modify data and so on, the disadvantage is a waste of space. Data files may often need to be modified and so on. The storage address is not continuous. This is particularly suitable for random IO, and random IO saves space, but it is a bit slow

The log file

Let’s start with the individual logs in the log file, just those

Errorlog (errorlog)

This function is enabled by default. It records all critical error information and details about the startup and shutdown of MySQL each time.

Binary log

This is so important that you must remember it!

Off by default, it records all DDL and DML statements in the database, but not select statements. DDL statements are directly recorded in the binlog, while DML statements must be submitted to the binlog through transaction submission. It is mainly used to implement mysql master/slave replication, data backup, data recovery,

General Query Log

This function is disabled by default. It records all user operations, including adding, deleting, modifying, and querying information. In the case of large concurrent operations, a large amount of information is generated, resulting in unnecessary DISK I/O and affecting mysql performance.

Slow Query log

This function is disabled by default. All queries whose execution time exceeds long_query_time seconds are recorded. SQL statements with long query times are collected to improve query performance.

Redo log

It is primarily used to ensure the persistence of transactions. In case there are dirty pages that have not been written to disk at the time of the failure, redo the mysql service according to the redo log to achieve transaction persistence.

Rollback log (undo log)

It holds a version of the data prior to the transaction, can be used for rollback, and can also provide multi-version read under concurrent control (MVCC).

Relay log

For this, Liu knows that there are two places where it is useful, one is mysql primary/secondary replication, and the other is canal synchronizing mysql increment data. The slave SERVER I/O thread reads the binary logs of the master server and records them in the local file of the slave server. Then the slave SERVER SQL thread reads the relay-log content and applies it to the slave server, so as to keep the data consistent between the slave server and the master server.

The data file

InnoDB data files

The. FRM file stores data information related to tables, including the definition of table structures.
Ibd file: Use the exclusive table space to store table data and index information. One table corresponds to one IBD file.
.bdata files: A shared table space is used to store table data and index information. All tables share one or more IBData files

MyIsam data file

The. FRM file stores data information related to tables, including the definition of table structures.
Myd file: mainly used to store table data information.
Myi file: Data tree used primarily to store any indexes in the table data file.

conclusion

This article, as the first big data development guide for MySQL, introduces the architecture content of MySQL in detail, and introduces each module and process in detail. I hope you can follow Liu’s article and carefully outline ideas, and try to explain these points in your own words!

Although the current level may not be as good as you, but Old Liu will strive to become more excellent, let you friends self-study from now on!

If you have any questions, please contact our official account: hardworking Old Liu. The article has seen this, like attention to support a wave!