Most of the data that programs need to process today comes from databases, especially relational databases. So what does the database do when an SQL is submitted to the database? If you don’t know these questions, you won’t be able to use the database and answer the interviewer’s questions well. Now the popular open source database is MySQL, and MySQL is also required to be asked in an interview. Therefore, I learned the column “MySQL Practice 45 lecture”. Today’s article tries to answer the following two questions:

1. What actions does the database perform after an SQL statement is submitted to the database?

How to restore MySQL to the state of a certain second on a certain day?

SQL > alter table SQL > alter table SQL > alter table SQL

The connector

Before the client can submit the SQL statement, you need to connect to the database, that is, provide the username and password to log in, and this is where the connector comes in.

If the user does not log out and reconnect, the user will not be affected. If the user does not log out and reconnect, the user will not be affected.

Memory resources are expensive and have to be cleaned up if not used. If you do not perform any operation, the connector will be disconnected automatically after a certain period of time (8 hours by default). In this case, an error message will be displayed.

A better solution is to use database connection pooling. Python programmers can use the third-party library DBUtils to manage database connection pools.

The query cache

The cache can quickly return the hit query, which is the same SQL as the second query, and the results are displayed immediately. The QUERY cache uses THE SQL statement as the KEY and the query result as the VALUE.

If your query finds the key directly in the cache and has permissions on the table, the value will be returned directly to the client.

If not, the following process is followed, and once the result is found, it is still stored in the query cache.

analyzer

If the query cache is not hit, the SQL statement is passed to the parser for lexical analysis, parsing for syntax errors, table names, field names, etc. JAVA can use Druid.

After resolving the table name, check the user’s permissions on the table and proceed to the next optimizer if the permissions match.

The optimizer

After a profiler, MySQL knows what you need to do.

Before execution can begin, it needs to be processed by the optimizer. The optimizer determines which index to use when there are multiple indexes in the table. Or determine the order in which tables are joined when a statement has multiple joins.

actuator

MySQL knows what you want to do from the parser, it knows how to do it from the optimizer, so it goes to the executor phase and starts executing statements. If you do not have permission to perform a query on T, it will return a no permission error.

You might ask, didn’t you already do permission verification? The reason why we need permission validation here is because in addition to SQL, we might have storage engines, triggers, etc., and in those objects, we might need to call other tables to get data, and we might need permission validation. In the previous phase, we couldn’t do this for triggers.

Such as:

select * from T where ID=10;
Copy the code

If the ID field does not have an index, then the execution flow of the executor is as follows: Call InnoDB engine interface to fetch the first row of the table, determine whether the ID value is 10, if not, skip, if it is, save the row in the result set; Call the engine interface to fetch “next row” and repeat the same logic until the last row of the table is fetched. The executor returns the recordset of all rows that meet the criteria in the above traversal as a result set to the client. At this point, the statement is complete.

The same logic is performed for indexed tables. The first call is the “fetch the first row that meets the condition” interface, and then the “next row that meets the condition” interface is iterated, all of which are already defined in the engine.

Speaking of storage engines, MySQl supports InnoDB, MyISAM, Memory and many other storage engines. The most commonly used storage engine is InnoDB, which has been the default storage engine since MySQL 5.5.5. In other words, if you do not specify the engine type when executing create Table, InnoDB is used by default. However, you can also select another engine by specifying the type of storage engine, such as using engine=memory in the CREATE Table statement to specify the use of memory engines to create tables. Different storage engines have different ways to access table data and support different functions.

Next, take a look at the execution of write operations. What do redo logs and binlogs do?

The write operation

First of all, it is safe to say that the same set of processes that query statements, update statements will also go through.

Unlike the query process, the update process also involves two important log modules: the redo log and the binlog. If you’re familiar with MySQL, you can’t get around these two words. Redo logs and binlogs have a lot of interesting design ideas that you can apply to your own programs.

Take the update operation as an example. Suppose the SQL statement is:

update table_a set count = count + 1 where id = 2
Copy the code
  1. The executor first finds the engine and fetches the line id=2. Id is the primary key, and the engine uses the tree search directly to find this row. If the row id=2 is already in memory, it is returned directly to the executor. Otherwise, you need to read into the memory from disk and then return.
  2. The executor takes the row given by the engine, adds 1 to it, for example, N+1 to get a new row of data, and then calls the engine interface to write the new row of data.
  3. The engine updates the data to memory and logs the update to the redo log. The redo log is prepared. The executor is then told that the execution is complete and the transaction can be committed at any time.
  4. The executor generates a binlog of this operation and writes the binlog to disk.
  5. The executor calls the commit transaction interface of the engine, and the engine changes the redo log to the commit state.

It should be noted that redo logs and binlogs are log files that are persisted to disk in case of unexpected restarts, power failures, and data recovery. To prevent frequent disk access, redo logs are written to the redo log buffer in the memory at regular intervals.

But there are differences between the two log files:

  1. Redo logs are unique to InnoDB; Binlog is implemented in the Server layer of MySQL and is available to all engines.
  2. A redo log is a physical log of what was changed on a data page. The binlog is the logical log that records the original logic of the statement, such as “add 1 to the C field on the line id=2”.
  3. Redo logs are written in a redo log cycle. Binlog can be appended. “Appending” means that the binlog file will be switched to the next one after being written to a certain size without overwriting the previous log.
  4. Redo log for rollback and binlog for recovery.

If you restore MySQL to a certain second on a certain day

To do this, you need to make regular full database backups of your MySQL database. The schedule here depends on the importance of the system and can be either one day or one week.

When you need to restore to a specified second, for example, at 2 p.m. on a certain day, you find that a table was deleted by mistake at noon, and you need to retrieve the data, you can do the following:

  1. First, find the most recent full backup, or if you’re lucky, last night’s backup, and restore from that to the temporary repository.
  2. Then, from the backup point in time, take out the backup binlog in turn, and resend to the time before the table was deleted by mistake at noon. This way, your temporary library will be the same as the online library before it was deleted by mistake.
  3. Finally, you can take the table data out of the temporary repository and restore it to the online repository as needed.

Why two-phase commit

In the preceding operations, the redo log was written before the disk was written. The redo log status was prepare. Then the binlog was written, and the redo log was committed.

Why wait for the binlog to be written before submitting it? This is because if the binlog is not written and is restarted unexpectedly, then the binlog does not have the record. When the binlog is played back during the subsequent master/slave replication, the data in the slave/master database will be inconsistent with the data in the master database.

If the binlog is written first and then the redo log is written, the system restarts unexpectedly. After the restart, the transaction is rolled back because the redo log is not written. However, the binlog is written successfully.

How does MySQL rollback and restore data?

InnoDB has a log file called redo log, which is persisted on disk. However, there is also a buffer in memory called redo log buffer. In case of abnormal restart, InnoDB has a background thread, and every second, Write logs from the redo log buffer to the file system’s Page cache and then fsync to persist them to disk.

Redo log buffer -> page cache -> disk this process occurs every second. If an abnormal restart occurs, you can recover from the redo log. So how exactly does that work?

Before the transaction commits, a redo log is written to the prepare state.

After a transaction is committed, the redo log status is COMMIT.

If an abnormal restart occurs during prepare, the mysql database rolls back the transaction in the Prepare state after it recovers.

If the status is COMMIT, it indicates that the data is written and restarts.

If it crashes before preparing, it doesn’t matter, because it didn’t start writing data and it doesn’t hurt to restart.

Today, with redo logs, data can only be retained, but there is no guarantee that data can be restored to its previous state.

This requires a binlog, which is the archive log that comes with mysql.

If the mysql database is restarted unexpectedly before writing binlog, the mysql database rolls back the transaction in the prepare state after it recovers.

If the transaction binlog is restarted abnormally after writing, check whether the transaction binlog exists and is complete:

A. If yes, commit the transaction. B. Otherwise, roll back the transaction.

You may be wondering why MySQL was designed to restore a prepared redo log with a full binlog after restarting it.

A: MySQL crashes after the binlog is written. The binlog has already been written and will be used from the library (or the library recovered from the binlog). Therefore, commit this transaction on the main library as well. With this strategy, data consistency between the primary and secondary repositories is guaranteed.

One more question: why not make redo log a binlog?

This is because redo logs are written in a circular fashion, starting from the beginning. As a result, redo logs cannot record operations over a period of time, historical logs cannot be retained, and redo logs cannot be archived.

Another reason is that the MySQL system relies on binlog. Binlog has been a feature of MySQL since its inception and is used in many ways. The high availability of MySQL system is based on binlog replication. Many companies have heterogeneous systems (such as data analysis systems) that consume MySQL binlogs to update their data. If you turn off binlog, these downstream systems won’t be able to input.

The last word

The magic of MySQL is the perfect combination of redo log and binlog. This ensures that the system can be restarted unexpectedly and data can be restored to any second of the day. This is provided that there is a full backup. For example, if the system involves user input, how to ensure that user input is not lost in the case of abnormal restart or power failure, and how to quickly restore the system configuration to the state before a certain day?

In addition, MySQL is a must-ask in an interview. If you are looking for a job, you are advised to prepare more. I have learned all of “MySQL Actual Practice 45”, and the column is of high quality.

If you have any questions, please leave a comment.