Optimistic and pessimistic locks for databases are not real locks

start

We write programs at ordinary times, there are a lot of cases need to be considered thread safety problem, if there is a global variable is likely to be more than one thread of execution at the same time to modify, so for this variable changes will need to have a mechanism to assure the accuracy of value and consistency, this mechanism is a common practice is locked. In fact, it is also easy to understand, as in reality, multiple people modify a thing at the same time, there must be a mechanism to queue multiple people. This is also true in the computer world, where multiple threads, or even multiple processes, modify a variable at the same time, and these threads or processes must be queued. The same is true in the database world, where multiple requests modify the same data record at the same time, the database must have a mechanism to sequence multiple requests, or to understand that the same data record can only be modified by one request at a time.

Locking is one of the most important mechanisms in the database. Whether you write select statements or UPDATE statements, they are closely related to locking at the database level. Without locking, the following things can happen when manipulating data:

Update loss: When multiple users update a data resource at the same time, data will be overwritten, causing data read and write exceptions.
Non-repeatable read: If one user reads the same data multiple times in a transaction, and another user updates the data simultaneously, the first user reads the data multiple times inconsistently.
Dirty reads: the first transaction reads the table that the second transaction is updating. If the second transaction has not completed the update, the first transaction reads half of the data that has not been updated, which is meaningless.
Phantom read: after the first transaction reads a result set, the second transaction adds or deletes the result set. However, when the first transaction queries the result set again, the data is lost or added.

Data Management Perspective

From a database management perspective or data row perspective, database locks can be divided into shared locks and exclusive locks, which are two types that are often mentioned during interviews. The essence is simple, from the point of view of the data, if the data is currently being accessed, what should the next access request be handled? Like computer binary, there are only two states: allowed access and not allowed access.

A Shared lock

A shared lock is called a read lock or an S-lock, and as mentioned above, a shared lock allows new requests to access data if they are read requests, but does not allow new requests to access data if they are write requests. Since the shared lock allows other read operations, the shared lock is usually applied only to SELECT operations, and serious data inconsistencies can occur if the shared lock is applied to an UPDATE or DELETE operation.

An exclusive lock

An exclusive lock is also called an exclusive lock or an X lock. Compared with a shared lock, an exclusive lock takes a more firm attitude. Once data is locked by an exclusive lock, any other requests (including read operations) must wait for the release of the exclusive lock before they can continue.

Update locks

When the database is ready to update data, it first locks the data object so that the data cannot be modified but can be read. When a data update operation is determined, it will automatically change the update lock to an exclusive lock. When other locks exist on the object, the update lock cannot be applied.

Intent locks

It simply means that a larger level of space indicates whether the inside has been locked. For example, an intent lock is placed at the table level to indicate that a transaction is going to use a shared lock on a page or row of the table. Placing an intent lock on a row of the table prevents other transactions from acquiring other incompatible locks. Intent locks can improve performance because the data engine does not need to examine every column and row of a resource to determine whether a compatible lock can be obtained for that resource. Intentional locks include three types: intentional shared locks (IS), intentional exclusive locks (IX), and intentional exclusive shared locks (SIX).

In practice, only one write operation can be performed simultaneously on data

Angle of granularity

Locks are used to lock data. We can divide locks into row locks, page locks and table locks according to the granularity of locked objects.

Row-level locking is the most fine-grained lock in the database. It only locks the current row. Row-level locking can greatly reduce conflicts in database operations. Its locking particle size is the smallest, but the locking cost is also the largest. Features: high overhead, slow lock; Deadlocks occur; The lock granularity is the lowest, the probability of lock conflict is the lowest, and the concurrency is the highest.
Table level lock is a lock with the largest granularity in the database. It locks the entire table in the current operation. It is simple to implement and consumes less resources. Features: low overhead, fast lock; No deadlocks occur; The lock granularity is large, and the probability of lock conflict is high and the concurrency is low.
Page-level lock is a kind of lock whose granularity is between row-level lock and table-level lock in database. Table level locking is fast but has many conflicts, while row level locking is slow but has few conflicts. So a compromise page level is taken, locking adjacent sets of records at a time. Features: Overhead and locking time are between table and row locks; Deadlocks occur; The locking granularity is between table locks and row locks, and the concurrency is average

Different databases support different locking forces, and even different engines of the same database support different locking forces, as shown in the following table (from network)

It should be emphasized here that no matter what database locks the data, it will require resource consumption, so the number of locks is actually capped. When the number of locks reaches this limit, the lock strength will be automatically upgraded, and more locks will be used to replace multiple locks with smaller ones.

Optimistic locks and pessimistic locks

Optimistic locking

Optimistic locks assume that data will not cause conflicts in general, so they are checked for conflicts only when the data is submitted for update. If there is no conflict then OK; If a conflict occurs, return an error message and let the user decide what to do. Similar to SVN and GIt, when a modified file needs to be submitted, it will check whether the current version of the file is the same as that on the server. If so, the file can be submitted directly. If not, You must first update the latest code on the server and then commit.

Optimistic locking is a kind of program design idea, through the contrast of a logo to decide whether the data can be manipulated, now common practice is to give the data to add a version number or timestamp to achieve optimistic locking operation process: design a version field version in the table, read for the first time, get a version field values. When the data is then updated or deleted, UPDATE… SET version=version+1 WHERE version=version. If a transaction has already changed this data, the change will not succeed.

Pessimistic locking

Every time we acquire data, we worry that the data will be modified, so we lock the data every time we acquire data to ensure that the data will not be modified by others in the process of our own use, and unlock the data after use. Because the data is locked, other threads that read or write to the data wait.

conclusion

Optimistic locking and pessimistic locking are not the types of locks held by the database itself (although pessimistic locking looks like an exclusive lock in form), but rather an idea of programming and a strategy to protect data consistency similar to database locking mechanisms.

Pessimistic locking is suitable for scenarios with frequent write operations. If a large number of read operations occur, the system locks each read operation, which increases the overhead of locking and reduces the throughput of the system.
Optimistic locking is suitable for scenarios where read operations are frequent. If a large number of write operations occur, data conflicts are more likely. To ensure data consistency, the application layer needs to continuously obtain data, which increases a large number of query operations and reduces the throughput of the system.

Write in the last

In the process of programming, no matter which type of lock is used to operate data, it is necessary to pay attention to the occurrence of deadlock. A deadlock may be fatal to the entire application. The essence of a deadlock is a failure to compete for resources, so it is best to write SQL statements for multiple table operations in a consistent order, another extreme approach is to lock all resources at once, rather than gradually lock resources.

More interesting articles

Distributed large concurrent series

Architectural Design Series

Series of interesting algorithms and data structures

Design Pattern series