This article is participating in “Java Theme Month – Java Debug Notes Event”, see < Event link > for more details.

The online service CPU soared

preface

  • Functional completion is only the first step in the project cycle, a perfect project is embodied in the runtime
  • Today we are going to take a look at a problem I encountered earlier: CPU spikes. At the code level, there was nothing wrong with the functionality but it made my head explode when I put it into use

Problem description

  • Click the data entry function on the system will be notified of relevant messages in the global monitoring. The server CPU soared 300%

Problem orientation

  • So first of all, let’s sort this outWebsocketA simple schematic of the principle of data transmission. Often positioning the problem is to ask what is our logic
  • When a client starts except andWebsocketIn addition to establishing a connection, we also need to connect toWebsocketThe service registers real-time data about which interfaces are required by the current client
  • I store the interface signature information in a Map inside the code. These interfaces are then bound to the client during client registration
  • When our listener is strong enough to change the data, it sends the latest data to the client bound to the relevant interface

Business location

  • Business is easy to locate, the problem is in our listener. When listening to data givenwebsocketA CPU spike occurs when the client sends the latest change interface for the subscription. It’s going to last a long time. It’ll come down in a minute
  • This is clearly a problem with the way we push messages

Isolate business by nature

  • To be a qualified programmer, you have to get out of business to get anything. The business is the shell of our code and everything that goes wrong is basically our problem. We use our online users within 1W. This should not be a problem in this kind of concurrent scenario. There must be a flaw in our programming logic if something goes wrong

  • Above is our message sending code. The code is simple. Obtain all clients that meet the sending conditions. It is then provided internally through the clientsendMessageMethod to push.
  • But this timemessageIt’s our interface information. Internally, reflection calls are made to get the latest data based on the method signature saved by the client. Push it to the client
  • At the heart of the above code isWebsocketManager.messageParse. This is getting the message and sending it. The fetch message is parsed in a Resultful format

  • Inside this method we have built in our four parsing methods. Here we just have to careRequestMappingMessageParseHandlerImplThis agreement.

  • We don’t have to worry too much about our internal protocols here. This is one of our own designs. We can also see the inside according to the diagram aboveRequestMappingMessageParseHandlerImplIs the core

The reasons causing

  • Above we briefly combed through the code logic.
  • If we look at it carefully we are going through all the clients and then calling the interface data in reflection to return it. In fact, we don’t need to call the data inside each client when a message is pushed. We could have just called the data and sent it on the traversal client.
  • This is also a problem that causes CPU to be too high. Our 1W users and colleagues may have 5000+ online. So we need more than 5,000 reflections and we can’t handle it. That’s why the beginning of this article says that functional is not business.

The solution

  • This is quantitative change leading to qualitative change. In the case of multiple clients, our design shortcomings are exposed. Here is also the author to dig his own hole. Now that we have the problem, we can solve it. Now we make a change to the code

  • I cache the data. Because the same batch is supposed to be consistent. Moreover, our system can also accept a certain time delay for real-time data. I’m going to add a cache here and that solves our loop problem

  • After testing, this change is approximately 100 times better on the CPU.

conclusion

  • Feature development completed just represents the feature of the experiment no problem
  • Single-user and multi-user are two different types of users. We should consider the amount of data as much as possible from the beginning of the functional design
  • The only thing I did well was isolate the data parsing through the chain of responsibility pattern. Otherwise, such problem location will be more troublesome