Last Thursday lunch break, I was on the station to take a nap, sleep as if to see their own with Li Bai in the glory of the valley kill four scene, just when I just took five kill ready to lead teammates to push the opposite crystal, a panic and rapid “bad” woke me up from sleep……

Abnormal SQL statement

I narrowed my dim eyes and found that the voice just came from my team leader Zhuang Ge. Seeing him nervously opening the log system to check the log, I had a hunch that something bad was happening.

It turns out that during the time I squinted, the ONLINE database server CPU was full, triggering the production database read only delay limit and an alarm, which lasted for half an hour.

This made me gasp, because many of the systems made by our group use the same database server, and there are hundreds of thousands of daily active users. If the server breaks down, all the system services will be unavailable.

So we hurried through the SQL log to find the problem, and finally checked out because a HIGH volume of SQL query did not take the index.

The log list shows that the number of rows scanned by this SQL statement is up to millions, which is basically the case of full table scan. In addition, tens of thousands of times were queried within half an hour, and each SQL query took more than 3000ms.

My god, no wonder the server CPU full, such a time-consuming SQL statement query so large, of course, the database resources are directly crashed.

Here is the query for that SQL: Temporary processing

Look at this statement, I gasped again, this is not my system call SQL statement? It’s over. There’s no escaping this time. You’re in your sleep.Of course, since I wrote the SQL myself, I could tell at a glance that there was something wrong with this statement.

According to my code processing, this SQL call is missing an important parameter user_fruit_id. If this parameter is not passed, this SQL query should not be followed.

In my design, this parameter is the leftmost field of a joint index in the table. If this field is not passed, the index will not take effect.

KEY `idx_userfruitid_type` (`user_fruit_id`,`task_type`,`receive_start_time`,`receive_end_time`) USING BTREE
Copy the code

Although the SQL statement is located, but the problem on the line is urgent, it is impossible to find out the Bug to change the line again.

As a result, we added a new federated index to the original table, essentially removing the user_fruit_id field, so that these high-volume queries could use the new index.

Like this:

KEY `idx_task_type_receive_start_time` (`task_type`,`receive_start_time`,`receive_end_time`,`created_time`) USING BTREE
Copy the code

By adding indexes, the number of rows scanned is significantly reduced, and the instance can be restarted.

Leftmost matching principle

So why is the leftmost field not indexed? This is because MySQL’s joint index is based on the leftmost matching principle.

As we all know, the bottom of the index is a B+ tree structure, and the structure of the joint index is also a B+ tree, but the number of key values is not one, but multiple, to build a B+ tree can only be built according to one value, so the database builds the B+ tree according to the leftmost field of the joint index.

For example, we use the joint index of two fields (name, age) to analyze:The picture comes from teacher Lin Xiaobin’s “MySQL Practice 45 lecture” course

When we search for all records whose name is “Zhang SAN” in the WHERE condition, we can quickly locate ID4 and find all records that contain “Zhang SAN”.

Name = “zhang SAN” and age = 10;

Because the joint index has two key-value pairs, it is possible to further locate the specific AGE record as long as the previous name is determined.

But if you query only age, then the index will not take effect, because the leftmost field does not match, and all subsequent index fields will not take effect.

This is why the SQL statement I wrote earlier is missing the leftmost user_fruit_id field and is running a full table scan query.

Normally, if a joint index is designed like (a, b), then a and B will be used as a condition, or A alone as a query condition, in which case we do not need to design a separate index for a.

If (a, b) is not available, you will have to maintain another index, which means you need to maintain both (a, b) and (b).

Find out the bugs

Although the temporary processing, but the problem is not solved, it is obvious that the system has a Bug will have such query conditions.

Because I wrote the code by myself, I immediately located the specific method in the code after KNOWING which SQL it was. Later, I found out that it was because my nullating of the user_fruit_id field did not take effect.

Since this field is passed from the caller, I annotate it with a non-null limit in the method argument, which is @notnull in the Javax package:

public class GardenUserTaskListReq implements Serializable { private static final long serialVersionUID = -9161295541482297498L; @apiModelProperty (notes = "fruit id") @notnull (message =" fruit ID cannot be empty ") private Long userFruitId; */..................... }Copy the code

Although THIS annotation is used for non-null verification, I have not added another annotation, @Validated, to the parameter.

If this annotation is not included, then the validation rules for calling the Javax package are not in effect. The correct way to write this annotation is to add a comment before the arguments of the Controller layer method:In addition, because the user_fruit_id field is the primary key of another table, I did not query the existence of this id in the code.

This way, no matter what value the caller passes, the SQL query is triggered directly and the full table scan is performed without running the index.Have to say, this is really a low-level mistake, really, I really feel di laugh to this reason, no matter how to say also work for a few years, how also make some novice level mistakes, this face dozen really let me quite ashamed.

conclusion

Although it was a simple mistake, the consequences were quite serious. This incident also made me more alert, and I must abide by the principles in the future development work, which are as follows:

① The calling end cannot be trusted. Important parameters must be verified first, even if they are non-null values. If they do not meet the conditions, they must return or throw exceptions directly. They cannot participate in business SQL queries, otherwise frequent access will also cause a burden on the service.

② Perform performance query first for SQL statements. For tables with a large amount of data, after the index is built, all SQL queries are tested with explain and the index is further optimized based on the results.

③ The code must be reviewed. I did not pay much attention to the Review of the code before, although it was also due to the tight iteration schedule, but in any case, the bugs were caused by my negligence, especially small errors such as null values that are common in Java.

Sometimes a small Bug can easily lead to the collapse of the whole system. This problem also makes me deeply understand the importance of reviewing code. No matter how troublesome the workload of business development is, this step should never be ignored.

From: MY humble Xue author: very lazy programmer