Probably the best MySQL important point on the web

“Reader welfare! 2 TB of all kinds of technical resources for free”

Author: Snailclimb

Source: https://segmentfault.com/a/1190000019619667

What is a MySQL?

MySQL is a relational database that is commonly used in Java enterprise development because MySQL is open source and free and easy to expand. Alibaba database system also uses MySQL a lot, so its stability is guaranteed. MySQL is open source, so anyone can download it under the General Public License (GPL) and modify it for their own needs. The default port number of MySQL is 3306.

Issues related to

What is a transaction?

A transaction is a logical set of operations that either all or none of them execute.

The most classic and often cited example of a transaction is the transfer of money. If Xiao Ming wants to transfer 1000 yuan to Xiao Hong, the transfer will involve two key operations: reducing Xiao Ming’s balance by 1000 yuan and increasing Xiao Hong’s balance by 1000 yuan. If something goes wrong between these two operations like the banking system crashes, and Ming’s balance goes down and Red’s balance doesn’t go up, that’s not right. A transaction is a guarantee that both of these critical operations will either succeed or fail.

What are the four properties of ACID?

Atomicity: Transactions are the smallest unit of execution and do not allow splitting. The atomicity of the transaction ensures that the action either completes completely or does not work at all;
Consistency: Data is consistent before and after a transaction is executed. Multiple transactions read the same data with the same result.
Isolation: when accessing the database concurrently, a user’s transaction is not disturbed by other transactions, and the database is independent between the concurrent transactions.
Persistence: After a transaction is committed. Its changes to the data in the database are persistent and should not be affected if the database fails.

What are the problems associated with concurrent transactions?

In a typical application, multiple transactions run concurrently, often working on the same data to accomplish their respective tasks (multiple users working on uniform data). Concurrency, while necessary, can cause the following problems:

Dirty read: When a transaction is accessing data and making changes to the data that have not been committed to the database, another transaction also accesses the data and then consumes it. Because this data is not committed yet, another transaction reads this data as “dirty data” and may not be doing the right thing based on “dirty data”.
Lost to modify: When a transaction reads data that is also accessed by another transaction, the first transaction changes the data and the second transaction changes the data. The result of the modification within the first transaction is then lost and is therefore called a lost modification. For example, transaction 1 reads A table A=20, transaction 2 also reads A=20, transaction 1 changes A=A-1, transaction 2 also changes A=A-1, the final result A=19, the modification of transaction 1 is lost.
Unrepeatableread: Reads the same data multiple times within a transaction. While this transaction is not finished, another transaction also accesses the data. Then, between the two reads in the first transaction, the data read by the first transaction may not be the same because of modifications made by the second transaction. This occurs when the data read twice within a transaction is not the same and is therefore called a non-repeatable read.
Phantom read: Phantom read is similar to unrepeatable reading. It occurs when one transaction (T1) reads several rows of data, followed by another concurrent transaction (T2) inserts some data. In subsequent queries, the first transaction (T1) will find more records that did not exist before, as if an illusion occurred, so it is called phantom read.

Non-repeatability and illusory difference:

Unrepeatable read focus is modified, magic read focus is added or deleted.

Example 1: Before Mr. A in transaction 1 reads his salary as 1000, Mr. B in transaction 2 changes A’s salary to 2000, which causes A to read his salary as 2000. This is unrepeatable reading.

Example 2 (under the same conditions, the number of records read on the first and second time is different) : There are 4 people whose salary is more than 3000 in a payroll table. Transaction 1 reads all the people whose salary is more than 3000 and checks 4 records in total. At this time, transaction 2 inserts another record whose salary is more than 3000 and checks 5 records when transaction 1 reads again.

What are the transaction isolation levels? What is the default isolation level for MySQL?

The SQL standard defines four isolation levels:

Read-uncommitted: The lowest isolation level that allows UNCOMMITTED data changes to be READ, potentially resulting in dirty, illusory, or unrepeatable reads.
Read-committed: Allows concurrent transactions to READ data that has been COMMITTED, preventing dirty reads, but magic or unrepeatable reads can still occur.
REPEATABLE-READ: Multiple reads of the same field are consistent, unless the data is modified by the transaction itself. This can prevent dirty reads and unrepeatable reads, but phantom reads are still possible.
SERIALIZABLE: Highest isolation level, fully subject to ACID isolation level. All transactions are executed one by one so that interference between transactions is completely impossible. That is, this level prevents dirty reads, unrepeatable reads, and phantom reads.

Isolation level	Dirty read	Unrepeatable read	The phantom read
READ-UNCOMMITTED	Square root	Square root	Square root
READ-COMMITTED	x	Square root	Square root
REPEATABLE-READ	x	x	Square root
SERIALIZABLE	x	x	x

The default isolation level supported by MySQL InnoDB storage engine is REPEATABLE-READ. We can do this by SELECT @@tx_isolation; Command to view

mysql> SELECT @@tx_isolation;+-----------------+| @@tx_isolation  |+-----------------+| REPEATABLE-READ |+-----------------+Copy the code

Here are some things to note: Unlike the SQL standard, InnoDB storage engine uses the next-key Lock algorithm in REPEATABLE READ transaction isolation level, thus avoiding phantom reads, unlike other database systems such as SQL Server. The default isolation level of InnoDB storage engine is REPEATABLE-READ, which can guarantee the transaction isolation requirements, i.e. the SERIALIZABLE isolation level of SQL standard.

Most database systems have read-committed isolation because the lower the isolation level, the less locks there are on transaction requests. But remember that InnoDB storage engine uses REPEATABLE READ by default without any performance penalty.

The InnoDB storage engine typically uses the SERIALIZABLE isolation level for distributed transactions.

Indexes related to

Why do indexes speed up queries

The following content is compiled from “Two great treasures of database [Indexes and Locks]”

Let’s start with the basic storage structure of MySQL

The basic storage structure of MySQL is a page (where records are stored) :

Each data page can form a bidirectional linked list
The records in each data page can form a one-way linked list

- Each data page generates a page directory for the records stored in it. When searching for a record through the primary key, dichotomy can be used to quickly locate the corresponding slot in the page directory, and then traverse the records in the corresponding group of the slot to quickly find the specified record- Use other columns (not primary keys) as search criteria: each record in a single-linked list can only be traversed from the smallest record.Copy the code

So, if we write a SQL statement like select * from user where indexName = ‘XXX’ without any optimization, the default would be:

Locate the page where the record resides: You need to traverse the bidirectional linked list to find the page
Find records from the page you are on: Because you are not querying by primary key, you can only traverse the singly linked list of the page you are on

Obviously, in the case of a large amount of data this lookup is very slow! The time complexity is O (n).

What does the index do to speed up our queries? This is the idea of turning disordered data into an order (relative) :

To find the record with ID 8 brief steps:

It is obvious that there is no index, we need to traverse the bidirectional linked list to locate the corresponding page, now the “table of contents” can quickly locate the corresponding page! (Binary search, time complexity is approximately O(logn))

In fact, the underlying structure is B+ tree, B+ tree as a realization of the tree, can let us quickly find out the corresponding record.

The following is excerpted from: How to Be a Java Engineer

What is the leftmost prefix principle?

An index in MySQL can reference multiple columns in a certain order, which is called a federated index. For example, the User table name and city with the combined index is (name,city), and the left-most prefix principle refers to when the query condition accurately matches the left row or columns of the index, then this column can be used. As follows:

select * from user where name=xx and city=xx ; / / Index can be hitselect * from user where name=xx ; // Index can be hitselect * from user where city=xx ; // Index could not be hitCopy the code

Note that if both conditions are used in the query, but the order is different, such as city= xx and name = xx, then the current query engine will automatically optimize to match the order of the joint index, so that the index can be hit.

Due to the left-most prefix principle, when creating a joint index, the order of index fields should consider the number of field values after they are deleted. The ORDER BY clause also follows this rule.

Be careful to avoid redundant indexes

Redundant index refers to the function of the index, the same can certainly will hit hit, it is redundant indexes such as (name, city) and (name) the two indexes is redundant indexes, able to hit the latter query must be able to hit the former, in most cases, should try to expand the existing index rather than creating a new index.

After mysqls.7, you can view redundant indexes by querying the SCHEMA_REDUNdANT_INDEXES table in the SYS library

Mysql > add index to table;

1. Add PRIMARY KEY (PRIMARY KEY)

ALTER TABLE `table_name` ADD PRIMARY KEY ( `column` )Copy the code

2. Add UNIQUE index

ALTER TABLE `table_name` ADD UNIQUE ( `column` )Copy the code

3. Add INDEX(normal INDEX)

ALTER TABLE `table_name` ADD INDEX index_name ( `column` )Copy the code

4. Add FULLTEXT(FULLTEXT index)

ALTER TABLE `table_name` ADD FULLTEXT ( `column`)Copy the code

5. Add multiple column indexes

ALTER TABLE `table_name` ADD INDEX index_name ( `column1`, `column2`, `column3` )Copy the code

The storage engine

Some common commands

Check out all the storage engines that MySQL provides

mysql> show engines;Copy the code

From the above figure, we can see that the default storage engine of MySQL is InnoDB, and only InnoDB of all storage engines in 5.7 is transactional storage engine, that is, only InnoDB supports transactions.

View the current default storage engine of MySQL

We can also view the default storage engine by using the following command.

mysql> show variables like '%storage_engine%';Copy the code

View the storage engine for the table

show table status like "table_name" ;Copy the code

MyISAM is different from InnoDB

MyISAM is the default database engine for MySQL (prior to version 5.5). While it performs extremely well and offers a number of features, including full-text indexing, compression, spatial functions, etc., MyISAM does not support transaction and row-level locking, and its biggest drawback is that it cannot safely recover from a crash. However, after version 5.5, MySQL introduced InnoDB (transactional database engine), the default storage engine after MySQL 5.5 is InnoDB.

Most of the time we use the InnoDB storage engine, but there are some situations where using MyISAM is appropriate such as read intensive situations. (If you don’t mind MyISAM crashing to answer questions).

A comparison of the two:

Whether row level locking is supported or not: MyISAM only has table-level locking, while InnoDB supports row-level locking and table-level locking. The default is row level locking.
Support for transactions and safe recovery after crashes: MyISAM emphasizes performance, atomicity per query, faster execution than InnoDB types, but does not provide transaction support. However, InnoDB provides advanced database features such as transaction support, transactions, and external keys. Transaction-safe (ACID Compliant) tables with COMMIT, Rollback, and Crash recovery capabilities.
Whether foreign keys are supported: MyISAM does not support them, InnoDB does.
MVCC supported or not: Only InnoDB supports MVCC. For high concurrency transactions, MVCC is more efficient than simple locking. MVCC only works at READ COMMITTED and REPEATABLE READ isolation levels; The MVCC can be implemented using optimistic and pessimistic locks; MVCC implementations are not uniform across databases.
.

MySQL High Performance has a sentence that reads:

Don’t believe the rule of thumb that “MyISAM is faster than InnoDB”. This conclusion is not always absolute. InnoDB is faster than MyISAM in many known scenarios, especially with clustered indexes or applications where data that needs to be accessed can be stored in memory.

In general InnoDB is fine, but MyISAM is a good choice for situations where you don’t care about scalability and concurrency, transaction support, and security recovery after a crash. But in general, we need to consider these issues.

The difference between optimistic and pessimistic locks

Pessimistic locking

Always assume the worst, every time to fetch the data that people will change, so every time when take data will be locked, so people want to take this data will be blocked until it got locked (Shared resources to only one thread at a time using, other threads blocked, after use to transfer resources to other threads). Traditional relational database inside used a lot of this locking mechanism, such as row lock, table lock, read lock, write lock, etc., are in the operation before the first lock. Exclusive locks such as synchronized and ReentrantLock in Java are implementations of the pessimistic locking idea.

Optimistic locking

Always assume the best case, every time I go to get the data, I think others will not modify it, so I will not lock it, but when updating, I will judge whether others have updated the data during this period, which can be achieved by using the version number mechanism and CAS algorithm. Optimistic locks are suitable for multi-read applications to improve throughput. Optimistic locks are provided by databases similar to write_condition. In Java. Java util. Concurrent. Atomic package this atomic variable classes is to use the optimistic locking a way of implementation of CAS.

Two types of lock usage scenarios

From the introduction of the two kinds of lock, we know that the two kinds of lock have their own advantages and disadvantages, can not be considered better than the other kind, for example, optimistic lock is suitable for the situation of less write (multi-read scenario), that is, conflict is really rare, this can save the lock overhead, increase the overall throughput of the system. However, in the case of overwrite, conflicts often arise, which can cause the upper application to be repeatedly retry, thus reducing performance. Pessimistic locking is suitable for overwrite scenarios.

There are two common implementations of optimistic locking

Optimistic locking is usually implemented using the version number mechanism or CAS algorithm.

1. Version number mechanism

Generally, a version field is added to the data table to indicate the number of times the data has been modified. When the data has been modified, the version value will be increased by one. When thread A wants to update the data value, it also reads the version value. When submitting the update, it updates the version value only when the version value it just read is the same as the version value in the current database. Otherwise, the update operation is retried until the update succeeds.

Take a simple example: Suppose you have a version field in the account information table in the database with the current value of 1. The current account balance field (balance) is $100.

Operator A then reads it out (version=1) and deducts $50 ($100-$50) from its account balance.
During operator A’s operation, Operator B also reads in this user information (version=1) and deducts $20 ($100-$20) from his account balance.
Operator A completed the modification and added one (version=2) to the data version number, together with the account deducted balance (balance=$50), and submitted it to the database for update. At this time, as the submitted data version is larger than the current version recorded in the database, the data is updated, and the database records version is updated to 2.
Operator B completed the operation and also added the version number (version=2) to try to submit data to the database (balance=$80). However, when comparing the recorded version of the database, it was found that the version number of the data submitted by operator B was 2, and the current version of the database record was also 2. Operator B’s submission is rejected because the optimistic locking policy of “commit version must be greater than record current version to perform updates” is not met.

In this way, the possibility of operator B overwriting operator A’s operation results with modified results based on older data version=1 is avoided.

2. CAS algorithm

Compare and swap is a well-known lock-free algorithm. Lockless programming refers to the Synchronization of variables between multiple threads without using locks. It is also called non-blocking Synchronization when no threads are blocked. The CAS algorithm involves three operands

Memory value V that needs to be read or written
The value A for comparison
The new value B to be written

CAS updates the value of V atomically with A new value B if and only if the value of V is equal to A, otherwise nothing is done (compare and replace is an atomic operation). Typically it is a spin operation, that is, repeated retries.

Disadvantages of optimistic locking

ABA problems are a common problem with optimistic locks

1 ABA problem

If A variable V is A value when it is first read and is still A value when it is ready to be assigned, can we say that its value has not been modified by other threads? Obviously not, because during that time its value could be changed to something else and then back to A, and the CAS operation would assume that it had never been changed. This problem is known as the “ABA” problem of CAS operations.

The AtomicStampedReference class provides this capability after JDK 1.5, where the compareAndSet method first checks whether the current reference is equal to the expected reference, and whether the current flag is equal to the expected flag. The reference and the flag are set atomically to the given updated value.

2. Long cycle time and high cost

Spin CAS (that is, if it fails, it loops until it succeeds) can be very expensive for the CPU to execute if it fails for a long time. The JVM can improve performance if it supports pause instruction provided by the processor. Pause instruction has two functions. First, it can delay de-pipeline instruction so that the CPU does not consume too many execution resources. Second, it improves CPU execution efficiency by avoiding CPU pipeline flush due to memory order violation during loop exit.

3 guarantees atomic operations of only one shared variable

CAS is valid only for a single shared variable, and not when the operation involves spanning multiple shared variables. However, since JDK 1.5, the AtomicReference class has been provided to ensure atomicity between reference objects, and you can put multiple variables into one object to perform CAS operations. So we can use locks or use the AtomicReference class to merge multiple shared variables into a single shared variable.

Locking mechanism and InnoDB locking algorithm

MyISAM and InnoDB storage engines use locks:

MyISAM uses table-level locking.
InnoDB supports row-level locking and table-level locking. The default row-level locking is performed

Table level locking vs. row level locking

Table-level lock: the lock with the largest granularity in Mysql. It locks the entire table in the current operation. It is simple to implement, consumes less resources, and locks quickly, avoiding deadlocks. It has the largest locking granularity, the highest probability of triggering lock conflicts, and the lowest concurrency. Both MyISAM and InnoDB engines support table-level locking.
Row-level lock: the lock with the smallest granularity in Mysql. Only the current row is locked. Row-level locking can greatly reduce conflicts in database operations. It has the smallest locking granularity and high concurrency, but the cost of locking is also the largest, and locking is slow, resulting in deadlock.

InnoDB storage engine lock algorithm has three kinds:

Record Lock: A lock on a single row Record
Gap Lock: A Gap lock that locks a range, excluding the record itself
Next-key lock: Record +gap locks a range, including the record itself

Related knowledge:

Innodb uses next-key lock for row queries
Next-locking keying to solve Phantom Problem
Demote the next-key lock to a Record key when the query index contains unique attributes
Gap locks are designed to prevent multiple transactions from inserting records into the same range, which can cause phantom problems
There are two ways to explicitly close gap locks :(use only record locks except for foreign key constraints and uniqueness checks) A. Set the transaction isolation level to RC B. Set innodb_locks_unsafe_for_binlog to 1

Big table optimization

When the number of MySQL single table records is too large, the CRUD performance of the database will be significantly reduced. Some common optimization measures are as follows:

1. Limit the scope of your data

Be sure to disallow queries that do not have any conditions limiting the data range. For example, when users query the order history, we can control it within a month;

2. Read/write separation

The classical database split scheme, the master library is responsible for writing, the slave library is responsible for reading;

3. Divide vertically

Split according to the correlation of the tables in the database. For example, if the user table contains both the user login information and the user’s basic information, you can split the user table into two separate tables, or even put them into separate libraries.

To put it simply, vertical splitting is the splitting of data table columns. A table with many columns is split into multiple tables. This should make it a little bit easier to understand.

Advantages of vertical split: The column data is smaller, the number of blocks read during query is reduced, and the I/O count is reduced. In addition, vertical partitioning simplifies table structure and is easier to maintain.
Disadvantages of vertical split: Redundant primary keys, need to manage redundant columns, and may cause Join operations, which can be solved by joining at the application layer. In addition, vertical partitioning makes transactions more complex;

4. Divide horizontally

Keep the data table structure unchanged and store the data shards with some policy. In this way, each piece of data is dispersed to different tables or libraries, achieving the purpose of distribution. Horizontal splitting can support very large amounts of data.

Horizontal splitting is the splitting of index table rows. When the number of table rows exceeds 2 million, it will slow down. At this time, the data of a table can be split into multiple tables to store. For example, we can split the user information table into multiple user information tables to avoid the performance impact of a single table having too much data.

Horizontal splitting can support very large amounts of data. Note that the split table only solves the problem of large data in a single table, but because the table data is still on the same machine, in fact, there is no significance to improve MySQL concurrency, so horizontal split is best.

Horizontal splitting can support very large amount of data storage and less application side transformation, but it is difficult to solve fragmented transactions, poor performance of cross-node Join and complicated logic. The author of “The Way to Train Java Engineers” recommends avoiding data sharding as much as possible because of the complexity of logic, deployment, and operation and maintenance. A typical data table can support less than 10 million data volumes with proper optimization. If sharding is necessary, choose client sharding architecture to reduce network I/O with middleware.

Here are two common solutions for database sharding:

Client proxy: The sharding logic is on the application side, encapsulated in jar packages, and implemented by modifying or encapsulating the JDBC layer. Dangdang’s Sharding-JDBC and Ali’s TDDL are two commonly used implementations.
Middleware proxy: Adds a proxy layer between applications and data. The sharding logic is uniformly maintained in middleware services. We are talking about Mycat, 360 Atlas, netease DDB and so on are the realization of this architecture.

END.

Key words: 1024 You can get a copy of the latest finishing 2TB technology dry goods: Including system operation and maintenance, database, Redis, MogoDB, e-book, Java foundation course, Java practice project, architect comprehensive tutorial, architect practice project, big data, Docker container, ELK Stack, machine learning, BAT interview intensive video, etc.

Being is the END being

Excellent article recommendation:

The Internet Explorer browser on your computer is being exploited by hackers using five dying programming languages!

Wake up the company went bankrupt! Mentality collapse.

There are 4 ways to monitor log files in real time on Linux

More detailed! Nginx Logging configuration Practices (Easter eggs)

Click [read article] to find out more

Feel good, please click here ↓ ↓ ↓