background
By the beginning of 2021, the functional construction of the system I was in charge of was completed and the system began to be promoted. With the gradual deepening of the promotion, received a lot of praise at the same time also received a lot of ridicule on the performance. When we first received the ridicule, our mood was like this:
As more and more performance jokes came back to us, we realized that interface performance had to become a higher priority. Then we tracked the interface performance monitoring for a week, at which point our mood went like this:
There are more than 20 slow interfaces. The response time of 5 interfaces is more than 5s, 1 interface is more than 10s, and the rest are more than 2S, with less than 99.8% stability. As a good back-end programmer, this data must be unbearable, and we immediately entered the long road of interface optimization. This article is a summary of our long working history.
Begin the text!
What problems cause interface performance problems?
There are many answers to this question, and they need to be analyzed according to your business scenario. Here’s an incomplete summary:
- Slow database query
- Deep paging problem
- Without index
- Index of the failure
- The join too much
- Too many subqueries
- Too many values in in
- The sheer amount of data is too large
- Complex business logic
- Cycle call
- Order calls
- The thread pool is not properly designed
- The lock design is not reasonable
- Machine problem (fullGC, machine restart, thread full)
Problem solving
1, slow query (mysql)
1.1 Deep paging
The so-called deep paging problem relates to the principle of mysql paging. Typically, mysql pages are written like this:
Select name,code from student limit 100,20Copy the code
Mysql > select student from student (100, 120); mysql > select student from student (100, 120); This is fine when the paging is too shallow, but as the paging gets deeper, the SQL might look like this:
Select name,code from student limit 1000000,20Copy the code
At this time, mysql will find 10,020 pieces of data and discard 1,000,000 pieces. Such a large amount of data, the speed must not be fast. So what’s the solution? In general, the best way is to add a condition:
select name,code from student where id>1000000 limit 20
Copy the code
In this way, mysql will go to the primary key index, connect directly to 1000000, and then find 20 items of data. However, this method needs the cooperation of the caller of the interface, and the maximum ID queried last time is sent to the interface provider as a parameter, which will have communication costs (caller: I will not change!). .
1.2 Unindexed
This is the easiest problem we can solve
Show create table XXXXCopy the code
View the index of a table. There are too many specific indexed statements on the web to repeat. By the way, before adding an index, you need to consider whether it is necessary to add the index. If the index has very low differentiation, it will not work. In addition, add index alter operation, may cause table lock, SQL execution must be in the low peak period (!!!!)
1.3 Index Failure
This is the worst case of slow queries to analyze, although mysql provides explain to evaluate the performance of a particular SQL query, which has indexes in use. But why does the index fail? Mysql doesn’t tell me that I need to analyze it myself. In general, there are several (perhaps incomplete) possible causes of index failure:
In particular, poor field differentiation should be evaluated at index time. If the differentiation is poor, the index should not be added at all. What does it mean to be poorly differentiated? Here are a few examples:
- A field can only have three values, so the index differentiation of the field is very low.
- Another example is a field that is largely empty and only a few have values;
- Another example is a field whose values are so concentrated that 90% of the values are 1, and the remaining 10% May be 2,3,4….
Mysql does not use the corresponding index if all of the above indexes fail. Mysql will select the appropriate index during SQL optimization. It is likely that mysql’s own selection algorithm will calculate that using this index will not improve performance, so it is abandoned. In this case, you can use the force index keyword to force the index to be used.
Select name,code from student force index(XXXXXX) where name = '*'Copy the code
XXXX is the index name.
1.4 Too many Joins or too many sub-queries
I put too many joins and too many subqueries together. In general, subqueries are not recommended and can be optimized by changing the subquery to join. At the same time, join associated tables should not be too many, generally 2-3 tables is appropriate. The specific association of several tables is more safe is the need for a specific problem specific analysis, if the amount of data in each table is very small, hundreds of thousands of, then the associated table can be more appropriate, on the contrary, the need for less.
It should also be mentioned that in most cases join is done in memory, so if the number of matches is small or the join_buffer setting is large, the speed will not be slow. However, when the amount of join data is relatively large, mysql will create temporary tables on the disk to associate and match multiple tables, which is obviously very inefficient, and the DISK IO is not fast, so it has to be associated.
In this case, it is generally recommended to split the data from the code level. In the business layer, the data of a table is first queried, and then the associated fields are used as conditions to query the associated table to form a map, and then the data is assembled in the business layer. In general, if the index is set up correctly, it will be much faster than join. After all, joining data in memory is much faster than network transfer and disk IO.
Too many 1.5in elements
This problem is not easy to troubleshoot if you only look at the code, and is best analyzed in conjunction with monitoring and database logs. If a query has in, and the in condition is indexed properly, then the SQL is still slow and it is highly suspected that there are too many in elements. Once the problem is identified, it is easier to solve it by grouping the elements into groups and checking each group once. If you want to go faster, you can introduce multithreading.
Furthermore, if the number of elements in is so large that it still can’t move fast, it’s better to have a limit
Select id from student where id in (1,2,3...... 1000) limit 200Copy the code
Of course, it’s best to set limits at the code level
If (ids.size() > 200) {throw new Exception(" no more than 200 data in one query "); }Copy the code
1.6 Large amount of simple data
This is a problem that can’t be solved by tinkering with code alone, requiring changes to the entire data storage architecture. Mysql > select * from subtable; Or simply change the underlying database and turn mysql into a database designed specifically to handle big data. This kind of work is a system engineering, which requires rigorous investigation, scheme design, scheme review, performance evaluation, development, testing and joint adjustment, as well as rigorous data migration scheme, rollback scheme, degradation measures and fault handling scheme. In addition to the above internal team work, there may also be cross-system communication work, after all, major changes have been made, the way the downstream system calls the interface may need to change.
For the sake of space, this will not be expanded. The author had the honor to participate in the database table division work of 100 million levels of data, and had a deep understanding of the complexity of the whole process. I will share it later if I have the opportunity.
2. Complex business logic
2.1 Circular Call
In this case, the same code is called in a loop, with the same logic for each loop. For example, we want to initialize a list that presets 12 months of data to the front end:
List<Model> list = new ArrayList<>(); for(int i = 0 ; i < 12 ; i ++) { Model model = calOneMonthData(i); List.add (model); list.add(model); list.add(model); }Copy the code
It is obvious that the monthly data calculation is independent of each other, so we can do it in a multi-threaded way:
// Create a thread pool. Keep it outside. Do not create a thread pool every time you execute code. Public static ExecutorService commonThreadPool = new ThreadPoolExecutor(5, 5, 300L, timeUnit.seconds, new LinkedBlockingQueue<>(10), commonThreadFactory, new ThreadPoolExecutor.DiscardPolicy()); List<Future<Model>> futures = new ArrayList<>(); for(int i = 0 ; i < 12 ; i ++) { Future<Model> future = commonThreadPool.submit(() -> calOneMonthData(i);) ; futures.add(future); List<Model> List = new ArrayList<>(); try { for (int i = 0 ; i < futures.size() ; i ++) { list.add(futures.get(i).get()); }} catch (Exception e) {logger. error(" error: ", e); }Copy the code
2.2 Sequential Invocation
If not similar to the above loop call, but a sequential call, and there is no result dependence between calls, then you can also use multi-threaded way, for example:
Look at the code:
A a = doA();
B b = doB();
C c = doC(a, b);
D d = doD(c);
E e = doE(c);
return doResult(d, e);
Copy the code
So I can solve it by CompletableFuture
CompletableFuture<A> futureA = CompletableFuture.supplyAsync(() -> doA()); CompletableFuture<B> futureB = CompletableFuture.supplyAsync(() -> doB()); C C = doC(futurea.join (), futureb.join ()); CompletableFuture. AllOf (futureA,futureB); CompletableFuture<D> futureD = CompletableFuture.supplyAsync(() -> doD(c)); CompletableFuture<E> futureE = CompletableFuture.supplyAsync(() -> doE(c)); Completablefuture.allof (futureD,futureE) return doResult(futuree.join (), futuree.join ());Copy the code
In this way, A and B can be executed in parallel, and D and E can be executed in parallel, depending on which logic is slower.
3. Thread pools are poorly designed
What might be the case when the interface is not performing fast enough even though we are using thread pools for tasks to be processed in parallel?
In this case, the thread pool design should be suspected in the first place. I thought it was worth reviewing three important parameters of the thread pool: the number of core threads, the maximum number of threads, and wait queues. How do these three parameters fit together? When the thread pool is created, if the thread pool is not preheated, the thread pool has 0 threads. When a task is submitted to the thread pool, the core thread is created.
When the core thread is fully occupied, if another task arrives, the task is queued to wait.
If the queue is also full, a non-core thread is created to run.
If the total number of threads reaches the maximum number and still a task arrives, discarding starts according to the thread pool discarding rule.
So what does this operation principle have to do with interface runtime?
- Too small core thread setting: If the core thread setting is too small, parallelism is not achieved
- The thread pool is common. Tasks of other services take too long to execute and occupy core threads. When a task of another service arrives, it directly enters the waiting queue
- There are so many tasks that they fill up the thread pool and a large number of tasks are waiting in the queue
During troubleshooting, once the cause of the problem is found, the solution is clear: adjust the thread pool parameters, split the thread pool according to business, and so on.
4. The lock design is unreasonable
There are two kinds of unreasonable lock design: unreasonable use of lock type or too thick lock.
A typical scenario where lock types are used incorrectly is a read-write lock. That is, reads can be shared, but reads cannot be written to shared variables; While writing, you can’t read or write. When read/write locks can be added, if we add mutex locks, the efficiency will be greatly reduced in a scenario where there are many more reads than writes.
Too thick a lock is another common case where the lock design is not reasonable. If we cover the lock too much, the lock time will be too long. For example:
public synchronized void doSome() {
File f = calData();
uploadToS3(f);
sendSuccessMessage();
}
Copy the code
This logic handles three parts: calculation, uploading results, and sending messages. Obviously uploading results and sending messages can be done without locking, as this has nothing to do with sharing variables. So it could have been:
public void doSome() {
File f = null;
synchronized(this) {
f = calData();
}
uploadToS3(f);
sendSuccessMessage();
}
Copy the code
5. Machine problem (fullGC, machine restart, thread full)
There are many reasons for this problem, such as fullGC caused by too large scheduled tasks, high RSS memory occupation caused by code thread leakage, and machine restart caused by many other reasons. Need to combine various monitoring and scenariospecific analysis, and then carry out large transaction splitting, replanning thread pool and so on
6. Snake oil solution
This adjective is from our unit a certain teacher to learn there, but I feel very appropriate. These panacea solutions tend to solve most of the interface slowness problems and are often the ultimate solution to our interface efficiency problems. When we really have no way to troubleshoot the problem, or there is no room for optimization, we can try this kind of cure-all method.
6.1 the cache
Caching is a space-for-time solution that stores a backup copy of data on high-performance storage media, such as memory or SSDS. When a request is made to the server, data is read from the cache first. If no, obtain data from the hard disk or network. Because memory or SSD is much more efficient than hard disk or network I/O, the interface responds much faster. Cache is suitable for scenarios where data reads are much larger than data writes and data changes are infrequent. From the perspective of technology selection, there are these:
- simple
map
guava
And other local cache toolkits- Cache middleware:
redis
,tair
ormemcached
Of course, memcached is rarely used these days because it’s not as good as Redis. Tair is a distributed cache middleware developed by Ali. Its advantage is that it can dynamically expand storage capacity under the condition of continuous service in theory, which is suitable for cache storage of large data volume. While it certainly has advantages over stand-alone Redis caching, its comparison with scalable Redis clustering needs further investigation.
Furthermore, the current cache model is generally a key-value model. How to design a key to improve cache hit ratio is a general question, and there is a huge difference in performance between a good key design and a bad one. In addition, there is no definite rule for key design, which needs to be analyzed based on specific business scenarios. The cache design is basically the maximum length of articles shared by major companies.
6.2 Callback or Reverse Check
This approach is often a business solution, which is more widely used in order or payment systems. For example, when we make a payment, we need to call a special payment system interface, which, after a series of validation, storage work, also calls the bank interface to perform the payment. Due to the strict requirement of payment, the implementation of bank-side interface may be slow, thus dragging down the performance of the whole payment interface. In this case, we can adopt the fast Success approach: when the necessary checksum storage is complete, immediately return success, and tell the caller an intermediate state “payment in progress”. Then call the bank interface, and call the callback interface of the upstream system to return the final result of payment “achievement” or “failure” after obtaining the payment result. In this way, the payment process can be executed asynchronously, improving the efficiency of the payment interface. Of course, in case the callback interface is inconsistent when multiple business parties are connected, you can throw the results into Kafka and have the caller listen for their own results.
conclusion
This paper is a simple summary of the performance optimization problems encountered by the author in the work. There may be incomplete places, welcome to discuss and exchange. In the meantime, please comment, like and forward!