An overview of the


Recently, the ordering interface of the company is a little slow, and the boss is worried that he can’t support double 11, so he wants me to optimize it, but the premise is not to allow a major change, because the ordering interface is too complicated, and if it is changed too much, I’m afraid it will be risky. Development and testing costs are also very high. I always enjoy such challenging tasks because I can learn a lot in the process of solving problems.

At the time, I just knew that the ordering interface was slow, but no one told me where the slowness was, i.e., what bottlenecks were causing the slow ordering interface. It doesn’t matter if no one knows, because we can find the specific bottleneck by manometry.

The following is a detailed description of the problems encountered in the pressure test, how to solve them and what tools were used during the test.


Tools and environments used


tool

  • Jmeter
  • JAVA comes with JVisualVM
  • JMX
  • nmon

The environment

  • Tencent cloud Mysql

  • Tencent cloud 2 core 4G server 1


To find bottlenecks


The order belongs to the write interface. In most cases, the bottleneck is in the DB, and the program may be waiting for the DB lock to be released. To test this idea, we can use Jmeter and JVisualVM.

To monitor the server and JAVA processes in the server, we need to enable JMX. We can add the following parameters when the JAVA process is started:

JMX_OPTS="-Dcom.sun.management.jmxremote.port=7969 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=xx.xx.xx.xx"

nohup java ${JMX_OPTS} -jar xxxxx.jar
Copy the code

Djava. Rmi. Server. The server hostname to fill in a JAVA process of IP addresses, – Dcom. Sun. Management jmxremote. JMX monitoring port is specified port = 7969, here is 7969.

After restarting the process, open the local (I use Windows 10)jvisualvmTo addJMXConfiguration. Once configured, you can click on the threadtabBecause we’re going to do threadsdumpTo observe the execution of the thread.

Ok, now we can use itJmeterTo pressure the single interface. You can start with 50 concurrent threads and the execution time is 1 minute.

During the manometry, do a threaddumpAnd at the same time utilizenmonObserving the Application ServerCPUThe load condition of.

The load is very low, the CPU is still not up after the thread concurrency is adjusted to 100, so that we can preliminarily judge that there is a lock in the code. The dump file displays the following information:

- locked <22f6e7f3> (a com.mysql.cj.core.io.ReadAheadInputStream)
- at com.sun.proxy.$Proxy231.reduceSkuStock(Unknown Source)
Copy the code

The business code that triggers this lock is the reduceSkuStock method. By reading the code, you can see that reduceSkuStock is wrapped in a large transaction.

@transactional (rollbackFor = {exception.class}) createOrder() {//1, reduceSkuStock(); // create order insertOrder(); //3. Other write operations... }Copy the code

An inventory record usually has a separate inventory table, and because the order creation method is a large transaction, the database row lock is not released until the entire createOrder () method has been executed, during which time no other thread can write to the inventory record. Therefore, we can open another transaction in reduceSkuStock() to release the lock after the inventory record, which should improve performance. To verify that the order interface is slow because of transactions, we can simply remove the transaction from the createOrder() method and test it again.

Pressure test results found that the single interface TPS doubled, CPU also improved a lot, but still not ideal, there should be other locks in the code. Another lock was found in thread dump.

- locked <438be230> (a org.apache.http.pool.AbstractConnPool$2)
- at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)

Copy the code

The HttpClient’s execute method is waiting for the connection pool to be used. Add the following code quickly:

 PoolingHttpClientConnectionManager pool = new PoolingHttpClientConnectionManager();
 pool.setDefaultMaxPerRoute(400);
 httpClient = HttpClients.custom().setConnectionManager(pool).build();
Copy the code

After pressing again, it was found that there was no lock in the code. TPS increased by 5 times. But then there are a few things to do:

1. Print all SQL of the single interface, and then explain operations one by one to see if there is a full table scan statement or SQL statement that does not use the index; 2. Observe the frequency of FULL GC during the execution of single interface; 3. Increase the number of MYSQL connections in the application.

Well, at this point, we can go back and solve the inventory problem. Since the boss said that it could not be greatly changed, SO I opened another transaction in the reduceSkuStock method.

@Transactional(propagation = Propagation.REQUIRES_NEW)
reduceSkuStock(){}
Copy the code

Release the row lock as soon as the thread executing the inventory operation completes. So in the development environment, after tuning, the TPS of single interface has been improved by about 3 times. Of course, due to the poor database and application server in the development environment, TPS will also be affected. After optimization, pressure measurement was carried out in production and TPS increased by 10 times.


conclusion


This is an optimization scheme in the case that the logic of the ordering interface cannot be greatly changed. Generally speaking, the inventory operation should be a separate service and can be optimized separately. Simple order logic can also be optimized.