Have you thought about the following?

Where is all the data that we DELETE from MySQL?

Do you even have to ask? Of course it was deleted

If transaction A deletes the data with id=1 and transaction B reads the data with id=1, what data will MVCC return to the user if the data is deleted?

Yes, it is necessary to understand the principle of MySQL multi-version concurrency related to things, you can read my previous article.

So, in reality, the data that is deleted by calling the DELETE statement is not actually physically deleted, it is actually still there, but it is marked with a marker, indicating that it has been deleted.

This is actually with our daily operation – soft delete, almost the same meaning

In MySQL, the UPDATE and DELETE operations are essentially the same. A DELETE operation merely marks a particular bit in a row as deleted.

So here’s the problem: if all this data keeps piling up, won’t the hard drive burst sooner or later?

If all is said and done, can MySQL be used in production environments on the scale it is now? How does MySQL work?

This is where the Purge operation comes in.

What is the Purge operation?

The Purge operation is the one that actually physically deletes the data (which has been marked as deleted).

The Purge operation targets a data object, not just a row, but also its index data and Undo Log.

Okay, so here we go again.

So the question is, when does Purge execute? In fact, you can think of the Purge thread (the Purge thread for short) as a thread that periodically executes in the background.

There can be one or more Purge threads, and the number of Purge threads can be configured using the innodb_purge_Threads configuration item in MySQL. Of course, I’m sure you don’t remember configuring this when using MySQL because innodb_purge_Threads has a default value of 4.

InnoDB will allocate based on the number of tables in MySQL and the number of Purge threads.

But because of this feature, the number of Purge threads needs to be adjusted according to business conditions. For example, suppose the DML operations are concentrated on a table, such as Table 1…

Hold on a second, I’ll interrupt……

What is a DML operation? Always like to make some complicated nouns… DML Data Manipulation Language (DML) Data operation statements, in fact, is CRUD add, delete, modify and check…

Similar concepts include DDL (Data Definition Language) Data Definition statements, namely CREATE, DROP, ALTER, and so on.

And DCL (Data Control Language) Data Control statements, GRANT, REVOKE, etc…

Again, the number of Purge threads is configurable, but it’s not as many as you want. Otherwise, if you give it 10000 threads, it will go straight to OOM.

Innodb_purge_threads Max is 32, so InnoDB is actually starting 32 Purge threads. For a simple example, let’s say we only have one table, and then we configure 32 Purge threads.

You look at this picture and you ask yourself, is this the river? This not only wastes system resources, but also causes the different Purge threads to compete for data. Not only that, but the Purge thread could also compete with the user thread.

However, when you have 32 tables, the situation is different: one Purge thread equals one table, so there is no data competition between the threads, no waste of system resources, and better performance for the Purge operation.

This is why InnoDB adjusts the number of Purge threads in MySQL according to the actual situation, so we need to configure them accordingly.

For example, if you have a database where add/delete operations are concentrated on a few tables, consider setting Innodb_purge_Threads a little lower. On the other hand, innodb_purge_Threads can be set to a larger size if the add, delete, and change operations occur on almost every table.

Now that we know about the Purge thread itself, we can learn about the objects that the Purge thread is targeting. The Purge thread cleans mainly Undo Logs, followed by row records.

Because Undo Log can be divided into:

  • Insert Undo Log
  • Update Undo Log

So it’s more accurate to say that the Purge thread is cleaning up the Update Undo Log and row records, because the Insert Undo Log will be deleted after the transaction commits.

InnoDB’s MVCC data source is a single linked list of Undo logs. The Purge thread is used to periodically clean up the Undo logs and remove the rows when deleting the Undo logs.

So how many conditional Undo logs does the Purge thread read at a time?

Obviously, it does not depend on the mood of the moment to decide how many. It is controlled by the configuration item innodb_purge_batch_size, which defaults to 300. InnoDB then allocates the 300 Undo logs to the Innodb_purge_Threads Purge thread. During the cleanup process, the Purge thread also frees files in the Undo Log tablespace.

This article is available on Github github.com/sh-blog. Welcome Star. Wechat search concern [SH full stack notes], reply [queue] to obtain MQ learning materials, including basic concept analysis and RocketMQ detailed source analysis, continuous update.

If you think this article is helpful to you, please click a like, close a note, share, leave a message.