Why InnoDB concurrency is so high?

InnoDB row lock: How to lock a nonexistent record? Buried a pit, did not expect a strong response to the comments, we hope to dig deep. The original plan is to write InnoDB lock to end this case, since the voice is so high, simply write InnoDB concurrency control, lock, transaction model.

The system is relatively grand, a must not finish writing, let me talk, colloquially to explain the context.

1. Concurrency control

Why concurrency control?

Concurrency Control is a must for concurrent tasks operating on the same critical resource, which, if left unaddressed, can lead to inconsistencies.

Technically, how is concurrency control usually done?

Common means to ensure data consistency through concurrency control are:

The lock (Locking)
Multi Versioning of data

Second, the lock

How to use common locks to ensure consistency?

Common lock, most used:

(1) Before data operation, it is locked and mutually exclusive, and other concurrent task operations are not allowed;

(2) After the operation is completed, release the lock and let other tasks execute;

And so on, to ensure consistency.

What are the problems with normal locks?

Simple locking is too harsh to parallelize even the “read task”, which is inherently serial.

Hence shared and exclusive locks:

S hare Locks (also called S hare Locks) are added to read data
EX clusive Locks (referred to as X Locks) are added to modify data

Shared and exclusive locks are played as follows:

Shared locks are not mutually exclusive. They can be read in parallel
An exclusive lock is mutually exclusive with any lock, shorthand: write read, write can not be parallel

As you can see, once the writing task is not completed, the data cannot be read by other tasks, which has a great impact on the concurrency.

Voiceover: To the database, write transactions are not committed, and the select that reads related data is blocked.

Is it possible to improve concurrency even further?

Even if the write task is not complete, other read tasks may be concurrent, leading to multiple versions of data.

3. Multiple versions of data

Multiple versions of data is a way to further improve concurrency. Its core principles are:

(1) When a write task occurs, clone a copy of the data and distinguish it by the version number;

(2) The write task operates the newly cloned data until it is submitted;

(3) Concurrent read tasks can continue to read data of the old version without blocking.

As shown above:

1. The initial data version is V0;

2. At T1, a write task was initiated, which clone a copy of the data and modify the version to V1, but the task was not completed;

3. A read task is concurrent at T2, but data of V0 version can still be read.

4. Another read task is concurrent at T3, which is still not blocked;

As you can see, there are multiple versions of data, and the concurrency of the task can be greatly improved by “reading old version data”.

Improve the idea of concurrent evolution in this way:

Ordinary lock, the nature is serial execution
Read and write lock, can achieve read concurrency
Multiple data versions enable concurrent read and write

Voiceover: This idea, more important than any other technical detail in the whole article, is to keep in mind.

Ok, so InnoDB, how does that work?

Redo undo rollback section

Before we go further into how InnoDB uses redo logs, undo logs, and rollback segments to greatly improve the concurrency of tasks, we need to introduce redo logs, undo logs, and rollback segments.

Why do I have redo logs?

After a database transaction commits, the updated data must be flushed to disk to ensure ACID properties. The random disk write performance is poor. If disks are flushed every time, the database throughput is greatly affected.

Optimization greatly improves performance by writing changes to the redo log first (sequentially at this point) and then periodically flushing data to disk.

Voiceover: The architectural design approach here is that random write is optimized for sequential write, and thinking is more important.

If the database crashes and data is not flushed, the redo log is redone after the database restarts to ensure that the impact of committed transactions on data is flushed to disk.

In short, the redo log is used to guarantee the ACID properties of committed transactions.

Why do I have undo logs?

When a database transaction is not committed, the mirror of the modified transaction data (that is, the old version before the modification) is stored in the Undo log. When the transaction is rolled back or the database crashes, the Undo log, that is, the old version data, can be used to undo the impact of the uncommitted transaction on the database.

Voice-over: In more detail,

For insert, the undo log records the PK(ROW_ID) of the new data, which is deleted during rollback.

For delete/ UPDATE operations, undo logs old data row, which is directly restored during rollback.

They are stored in separate buffers.

In short, undo logs are used to ensure that uncommitted transactions do not affect the ACID properties of the database.

What is a rollback segment?

The undo log is stored in the rollback section.

Undo logging and rollback sections are closely related to InnoDB’s MVCC. Here is an example to illustrate.

Chestnut:

t(id PK, name);

Data as follows:

1, shenjian

2, zhangsan

3, lisi

No transactions are uncommitted at this time, so the rollback segment is empty.

Then a transaction is started:

start trx;

delete (1, shenjian);

update set(3, lisi) to (3, xxx);

insert (4, wangwu);

And the transaction is in an uncommitted state.

You can see:

(1) The data (1, Shenjian) before being deleted entered the rollback section as the old version data;

(2) The data before modification (3, LISI) is taken as the old version data and entered the rollback segment;

(3) The inserted data PK(4) enters the rollback segment;

Next, if the transaction is rollback, the undo log in the rollback section can be rolled back.

Voiceover: If the transaction commits, the undo log in the rollback section can be deleted.

You can see:

(1) The deleted old data is restored;

(2) The old data that was modified is also restored;

(3) The inserted data is deleted;

Transaction rollback successful, business as usual.

4. InnoDB is a storage engine based on multi-version concurrency control

InnoDB is the most recommended storage engine for high-concurrency Internet scenarios because of its Multi Version Concurrency Control (MVCC). Row locking, concurrency, transaction rollback, and many other features are related to MVCC.

MVCC reduces the lock conflicts of concurrent transactions and improves the concurrency of tasks by “reading old version data”.

Core issues:

Where is the old version data stored?

Does storing old versions of data have a huge impact on the original architecture of MySQL and InnoDB?

With the undo log and rollback sections mentioned above, these two questions are easy to answer:

(1) The old version data is stored in the rollback section;

(2) Not much impact on the original architecture of MySQL and InnoDB;

The InnoDB kernel adds three internal attributes to all row data:

DB_TRX_ID, 6 bytes, records the transaction ID of each row that was last modified.

DB_ROLL_PTR, 7 bytes, record pointer to undo log of rollback segment;

DB_ROW_ID, 6 bytes, monotonically increasing row ID;

How can InnoDB achieve such high concurrency?

The data in the rollback section is actually a snapshot of historical data. These data will not be modified. Select can read them concurrently without any restrictions.

Snapshot Read, a Consistent Nonlocking Read, is one of the core reasons for InnoDB’s high concurrency.

Consistency here means that the data read by a transaction is either data that existed before the transaction started (generated, of course, by other committed transactions) or data inserted or modified by the transaction itself.

What kind of SELECT is a snapshot read?

Unless locks are displayed, normal SELECT statements are snapshot reads, for example:

select * from t where id>2;

Here, display lock, non-snapshot read means:

select * from t where id>2 lock in share mode;

select * from t where id>2 for update;

So the question is, what are these reads that show locks? What kind of lock will be added? What does it have to do with the isolation level of the transaction?

That’s enough for this section, but we’ll break it down next time.

conclusion

(1) Common methods of concurrency control to ensure data consistency include locking and multiple versions of data;

(2) common lock serial, read and write lock read parallel, data multi-version read and write parallel;

(3) ACID properties of committed transactions are guaranteed for redo logs. Sequential writes are used instead of random writes to improve concurrency.

(4) Undo log is used to roll back uncommitted transactions, it is stored in the rollback section;

(5)InnoDB is a storage engine based on MVCC, which makes use of undo logs stored in the rollback section, namely the old version of data, to improve concurrency;

(6) The reason for InnoDB’s high concurrency is that snapshot reads are not locked;

InnoDB all normal select are snapshot read;

Voice-over: This article is based on MySQL5.6.

I hope you have a good idea. I will continue to dig into InnoDB locks in the next article.

Hope popular technical article is seen by more people, ask for help turn.

InnoDB, 5 best practices, Know why?

Related Posts

Off-screen rendering

My journey to de-Googling Android

Python Multiprocess in one article