Hello, I’m Leo. Currently, I am engaged in Java backend development in Changzhou. This article is the summary of MySQL learning collation series. This series will be output together with byte, netease, Ali, Tencent, Meituan and Kuaishou. Hope to help more friends as soon as possible into the factory!
After reading this article, don’t say you have learned MySQL.
After reading this article, stop saying that you have learned MySQL.
After reading this article, stop saying that you have learned MySQL.
After reading this article, stop saying that you have learned MySQL.
Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
Read 10. Magic
What is 10.1 and why is it there
This is done from the start of a transaction to the end of a transaction if the same data sees different results. That’s what we call phantasmagoria.
Let’s take an example
Transaction A reads data according to certain conditions, during which transaction B inserts new data with the same search conditions. When transaction A reads data according to the original conditions again, the newly inserted data of transaction B is found, which is called phantom read
10.2 What else can you think of
Dirty read
Dirty reads are a common problem at the uncommitted read isolation level. Changes in a transaction that are visible to other transactions can be called dirty reads even if they are not committed
10.3 Solutions
If the data is newly inserted, the gap lock can be used to solve the illusory problem.
If it is to modify a data can be used to lock the way to solve the illusory problem.
With the isolation level of serialization, phantom reading does not exist. Because serialization locks the entire table.
The specific implementation, principle and method of MySQL lock will be introduced in module 11
11. Lock (not done)
11.1 the global lock
The command
11.4 clearance lock
11.5 read-write lock
11.6 Shared lock
11.7 exclusive lock
11.8 intention to lock
11.9 Metadata Locking
11.2 table level lock
11.3 row-level locks
12. Left-most matching principle
MySQL is known to satisfy the leftmost matching principle. This is a very common point in the interview, and it will ask you to explain what the left-most matching principle is and how to apply it. Let’s introduce it.
Suppose the composite indexes are A, B, and C. We introduce each case by case
- A and A together satisfy the leftmost matching rule
- Using A and B together completely satisfies the leftmost matching principle
- B and A together satisfy the leftmost matching principle
- Using A, B, and C together completely satisfies the leftmost matching principle
- Partially using A and C together satisfies the leftmost matching principle
- B and C together do not meet the leftmost matching rule
So let’s summarize this, the leftmost principle. That is, the order of conditions in an SQL statement is irrelevant. The leftmost match rule is as long as the leftmost index tree appears. In the Explain execution plan, the key column can be used to check for a hit and left-most match.
Personal advice: here is a digression, it is suggested that every write a SQL we have to save the habit of walking the execution plan, if the index, the SQL optimization, a long time, slowly on the SQL optimization has a simple understanding, with some theory, your personal strength will definitely go to the next floor!
13. How to ensure primary/secondary synchronization of MySQL
MySQL master-slave synchronization is a problem that is guaranteed by the existence of these three formats. MySQL master-slave synchronization is guaranteed by the existence of three formats.
The command parameter
Binlog_format = 'row'Copy the code
statement
The first is the first statement. It’s general information, it’s almost our execution information, we don’t see what the logic is. So if you synchronize to a slave library, it’s easy to find data inconsistencies.
The advantage of the log format is that it is simple and occupies less space, but it is risky. Once data is lost, the corresponding data cannot be found.
row
The second type is binlog in row format. The advantage of this format is that the log is rich. As long as you have a binlog in row format, you can do whatever you want and the lost data can be retrieved at any time (usually 15 days). The only drawback is that the log is too rich, the memory footprint is too large, if it is online, after the disk write, the risk is also relatively large. You need to do something special.
For example, log backup is transferred periodically and set an expiration time. Save data within 15 days, nothing beyond 15 days!
mixed
The emergence of the format here is a combination of the above, why say so? Mixed format uses the advantages of statement format and row format.
I think a lot of readers might wonder, can anything be perfect? Let’s talk about it in detail.
Mixed will make an additional judgment as to whether the binlog will cause data inconsistencies. If so, use the row format. If not, log in statement format.
Master-slave synchronization
The primary and secondary databases rely mainly on binlog logs for data consistency synchronization. If the logs are kept in detail when we perform operations, then it is enough to ensure the consistency of the primary and secondary databases.
Here we extend two problems
(1) : Data security problem during primary/secondary switchover. There are two libraries: LIBRARY A and library B. The client accesses library A at first, and at this time, the master/slave switch is performed, and the master library is switched from A to B. (The data synchronization thread has super administrator rights.) When the client accesses library B, if the slave library is set to readonly mode
- Prevents misoperation of query statements of other running classes. Cause data inconsistency problems.
- This prevents logic bugs when switching between A and B
② : Primary/secondary synchronous cyclic replication problem. A statement is updated on node A and the generated binlog is sent to node B. Node B also generates A binlog after executing the update statement. I recommend that you set log_slave_updates to ON, indicating that the standby database generates a binlog after executing the relay log. So, if node A is also the standby library of node B, it executes the new binlog generated by node B again, and then the update statement is repeated between node A and node B, which is cyclic replication. Solutions are as follows:
- Specify that the server ids of two libraries must be different. If they are the same, the relationship between them cannot be set as master/slave.
- A standby database receives a binlog and, during playback, generates a new binlog with the same server ID as the original one.
- After receiving the log from the primary library, each library determines the server ID. If the log is the same as its own, it indicates that the log is generated by itself and directly discards the log.
Here do not do detailed introduction, for the interview to build! See other articles for detailed technical points.
14. What is the high availability of MySQL
When it comes to high availability, there are certainly two strategies. Reliability first policy, availability first policy.
When an interviewer asks you a question, you have to start at the beginning. For example, why is there high availability? You can talk about hardware issues, large transaction issues, large table DDL issues, replication from libraries issues. These series of problems led to delays, and two strategies were introduced for high availability. Here’s what those two strategies are.
Reliability-first strategy
- Judge from library B
seconds_behind_master
Whether the value is smaller than a certain value. If the value is larger than a certain value, the delay will affect the service data. Therefore, it must be smaller than a certain value before proceeding to the next step - Change primary library A to read-only and readonly to true
- To determine
seconds_behind_master
Until the value becomes 0. (Data is secure only if latency is low enough) - To change from library B to read/write, that is, to change readOnly to flase
- Finally, put all business requests to B
The second step here changes the main library A to read-only mode. This is the time that the main library A is not available.
Availability first policy
Compared to reliability-first, the only difference is. This does not wait until synchronization is complete before switching over and changing the state. It’s about putting a series of steps in place all at once. This is dangerous because the availability first policy of the active/standby switchover may cause data inconsistency. Therefore, in most cases, I recommend that you use a reliability-first policy. After all, data reliability is generally better than availability for data services.
The specific strategy depends on the business! In groups that don’t understand very well
15. Distributed transaction primary key ID
Some time ago, the company went on a hot program, separate library and table, read and write separation, one master and many slave technology stack. The problem of globally unique ids is considered. We use the snowflake algorithm to generate unique ids.
- The first part, which is a bit: 0, is meaningless.
- The second part is 41 bits: the timestamp.
- The third part contains five bits: the machine room ID, 10001.
- The fourth part has five bits: the machine ID, 1 1001.
- The fifth part is the 12 bits: the serial number, which is the serial number of the ID generated at the same time in a millisecond on a machine in a machine room, 0000 00000000.
This algorithm can ensure that a machine in a machine room generates a unique ID in the same millisecond. It is possible to generate multiple ids in a millisecond, but with the ordinal number of the last 12 bits to distinguish them.
15.1 the advantages
(1) High performance and high availability: the generation does not depend on the database, and is completely generated in memory.
(2) Large capacity: millions of self-increasing ids can be generated every second.
(3) ID increment: save in the database, index efficiency is high.
15.2 disadvantages
Depending on the consistency with the system time, if the system time is called back or changed, id conflicts or duplicate may occur.
15.3 Volume considerations
In the real development process, except for the first line of Internet factories will have so many machines, it is estimated that we will not contact so many machines, we can improve the algorithm, to generate 18 bit ID is enough for our use.
How many bits are generated here depends on the size of the company