Recently made a search interface optimization, repeated pressure test four times, finally meet the requirements, record, add a chicken leg???? in the evening

The business logic

The data is retrieved from OpenSearch, then assembled by various populations, and finally returned

The logic seems very simple, I also thought so at the beginning, so I estimated that it would be finished in 5 days. Finally, it took about 10 days to develop, coordinate and change the bug until it went online (of course, it was not only for this thing).

The complexity lies in the fact that there are many factors that affect the return structure, and troubleshooting requires checking the configuration, checking the database, checking the cache, checking OpenSearch, checking the code

Anyway, no matter how complicated the logic is, it’s not an excuse to avoid a problem, and it’s not a reason not to optimize. That’s not the point of this article

Yes, the interface provided to APP generally requires the response time to be less than 100ms

First pressure test

The average response time was 150ms, and other problems were found during this pressure test. The background reported an error, which is the limit of OpenSearch queries per second

Optimize code and configuration

1. Modify the OpenSearch configuration and change the OpenSearch connection address in the pressure test environment to the Intranet address

2. Change the place of circular query cache in code to one-time batch query return

3. Remove useless codes from the project after confirming with relevant students

Second pressure test

We optimized the code and changed the configuration, but things got worse and new problems were fixed

At that point, the code was checked to determine that the number of queries cached was minimal, and the connection thread pool parameters were adjusted to a relatively large and reasonable value

If the pressure test still fails, the last resort is to cache the result set

That is, the user ID and search keyword are regarded as key, the query result is value, and the cache is 5 minutes

Third pressure test

I finally met the requirements, and the response time reached 32ms at concurrency 60, and I found a new optimization point

Interface unexpectedly also check the database operation, which can not bear, after the investigation removed some unnecessary dependence

growth

Learned to use executePipelined of RedisTemplate for redIS batch queries

Summary for this optimization

1, must absolutely avoid cyclic database and cache (PS: loop inside can not have query cache, more can not have query database operation, because the number of cycles can not control)

2, for the API interface, are generally directly check the cache, no check database

3, multi-purpose batch query, less use of a single query, try to find out

4, for the use of Ali Cloud, we should pay attention to the configuration of the corresponding products, the money to spend or spend, at the same time, do remember to use the Intranet address of the corresponding products in the formal environment

5, pay attention to the connection pool size (including database connection pool, Redis cache connection pool, thread pool)

6, do not deploy other services on the machine of pressure test, only run the service to be tested, avoid being affected by other projects; For an online environment, it is best to have only one important service deployed on a single machine

7. Unusable and commented out code and unusable dependencies are best cleaned up in time

Clusters go without saying

9, some monitoring tools can help us better locate the problem, such as link tracking, the project used the PinPoint

10, if the space for technical optimization is very small, you can try to start from the business, with actual data to speak, you can convince the test from the daily traffic, historical traffic data

11. Every code change introduces new problems, so regression test your code after every change (PS: Postman and Beyond Compare come in handy after each change, when I search with several different sets of keywords to see if the data returned before and after the change is the same.)

12, the key place must add more logs, convenient to eliminate problems in the future, because the investigation of online problems is the most important or rely on logs

From: www.cnblogs.com/cjsblog/p/1…