An overview of the
The following share has skipped many pits, including redis, Tomcat environment configuration, machine hardware configuration and other issues (consistent with online, or hardware performance reduction coefficient, such as online: 8C16G, pressure measurement: 4C8G, simple coefficient difference of 2 times), directly out of the main ideas of mining bottlenecks out of the table.
Pressure measurement data analysis
Global preview
Through a live watching page for high concurrency pressure test, IN APM (Pinpoint) monitoring found an interesting place:
The data in the two red boxes in the figure above (close to 10s) occurred about 30 minutes apart at about 16:20. The system could not support the abnormal service unavailability. With curiosity, we traced the stack of method call, as shown in the figure below:
How long does it take? To clarify some of the concepts in the Call Tree:
This SQL query takes more than 14 seconds. Why? Mysql > select * from APM where SQL statement is displayed. Just keep digging!
By comparing the same URL, in the case of request response at millisecond level, the data is found as shown in the figure below:
After getting the data from Redis, we did not execute the SQL query again. Based on this analysis, we decided to trace the code back to the truth:
You can see that the database is queried directly after the cache is invalidated
The solution
SQL optimization: The priority is low
From the point of view of data analysis, SQL optimization is not useful, not return a large number of data missing index, this can be skipped.
Cache concurrency: has a high priority
experience
1. Make good use of monitoring tools, such as APM, for link monitoring, server performance, method call sequence observation
2. Trace method stacks and related logs
3, in-depth investigation code digging essence
Wechat official account: Le Shao Bulletin Board