I am kite, the public account “ancient time kite”, a not only technology public account, I have been in the app community for many years, mainly Java, Python, React also play 6 slash developer. The Spring Cloud series is complete, and you can check out the full series on my Github. You can also reply “PDF” in the public account to get the full PDF version of my elaborate tutorial.
When I had lunch that day, a colleague said, “THE project team is very angry with me. There is something wrong with the program. I @them in the group this morning and only replied to them at noon. I almost want to laugh.”
Generally speaking, if there is a problem with docking, if the mistake is not too obvious, I will first suspect whether there is a problem with myself, so as not to lose face at the time. So I said I’d go back after dinner and I’d help you figure out what the problem was.
The background that
Our current system is integrated with a number of third party systems, and the problem is one of them. In fact, it is very simple. Their system will generate some personal to-do tasks, and then the number of to-do tasks needs to be pushed to our APP and displayed as the corner of the icon.
User data has been through, in fact, very simple requirements, horn notification does not require real-time, 10 minutes to brush once. This scenario is typical, and a message queue is perfect. They push the data to the message queue, we go to the message queue, perfect.
They say the system is productized, does not support message queues, and only exposes the to-do task interface. Ok (smiling face), you are the product you are right. Maybe you have a small backlog of 300 + users, so make 300 + requests every 10 minutes. There is no multi-threading, just a simple loop of more than 300 requests, each taking about a minute.
Yeah, well, that’s fine.
By the way, the JDK for this service is version 1.6, and it is said that due to historical reasons, it is not dare to upgrade now. Furthermore, the service is deployed on Windows. (Wouldn’t you say amazing?)
HuaMingLiuAn
Well, let’s do a timed task and get 300 clicks per 10 minutes. It’s fun and easy.
But the good times did not last long, the weather did not meet people’s wishes, the server did not meet programmer’s wishes.
The following is my colleague’s experience.
It was on the second night of what should have been a normal night, but the alarm email disturbed my sleep. A look at the log, memory usage is too high, burst, causing the machine to restart automatically. That’s the only good thing about Windows. It restarts automatically. Then manually go up and start up the service.
A day later, or in the evening, the alarm again, the server restarts automatically, and the memory space is too high. Manually go up and start the service again.
So he reported back to the developer of the service, and got the reply: “There is no problem with our service, it must be your call problem, you must stop the scheduled task, so it is your problem.”
So he came to me and explained the situation and asked me what the problem might be.
Me: Are you sure the scheduled service is every 10 minutes without an endless loop?
Colleague: Sure.
Me: Does their service use an external cache like Redis?
Colleague: I don’t know.
Me:… Since you are sure that you call no problem, it must be their program has a problem to the memory burst ah, this have what good doubt, let them change it.
Colleague: Now they say they have no problem.
Dig out those responsible
Well, if they say it’s okay, I’ll help him figure it out. So, remote access to the Windows server.
At this time, the scheduled task has been running for two days, the 16G memory has been used up more than 15G, see at any time may crash, and then stop the scheduled task, the memory usage will not come down.
I started to wonder if I was using an external cache like Redis. When I checked the server, redis and memcached were not installed, so I excluded external cache. (A later look using the JVM tool proves this as well)
Since it’s not an external cache, it must be on the JVM, or it’s using the JVM cache, or it’s leaking. I decided to use jinfo-flags to check the JVM initial parameters. JDK 6 doesn’t support -flags yet.
Then I don’t know if I tried jmap-heap or just looked at jmap-help and thought JAMP-heap wasn’t supported, but I ended up looking at the JVM through JConsole. The JVM parameters are clearly not set by default, and the odd bit is that there are over 700 MEgabytes of memory. 700 meters versus 15 grams, how is that different? It doesn’t make sense. It’s not the stack.
Then I tried GC, and nothing improved. Up to this point, I had a serious suspicion that there was a memory leak.
I performed jmap-dump to dump the heap and thread information and pull it to local analysis. Don’t see don’t know, a look startled, threads to suffocate.
I have to say, one thing they did very well, actually considerate to the number of threads, yes, there are so many threads more than 100,000. So let’s say we get 300 requests in 10 minutes, that’s 300 threads, 30 x 6=1800 in an hour, 1800 x 24=43200 in 24 hours a day, 100,000 threads over 2 days and that’s exactly right. It’s awesome.
The default size of a thread is 1M, 100,000 + threads that is 10 + GIGABytes, plus the memory footprint of the heap and other services on the machine, 15GB of memory is right.
We deal with our problems
Is it so hard to find a problem when there is a problem, and how to not admit that there is a problem with their program.
Okay, you guys don’t look it up. I found the cause for you. You’re satisfied.
So the colleague sent them the screenshot without saying anything extra.
In the afternoon, the other party in the wechat group sent a message, the problem has been modified, you can try again.
Then, several days passed and the problem did not recur.
To avoid problems
Some of you asked, can the system create more than 100,000 threads? It’s possible. This article is written “fake you stupid” great god can create many threads under Linux system source code analysis club.perfma.com/article/244… If you are interested, you can go up and have a look.
This problem is caused by the thread being created but not destroyed.
Logical errors aside, the correct way to use threads is to use thread pools to avoid unnecessary performance losses and the problem of endless thread creation with uncontrolled, untimely destruction.
Creation is not easy, small praise, big warm, warm me quickly. You’re welcome. Like me!
I am kite, public id “ancient Time kite”, I have been working in the application circles for many years, mainly Java, Python, React also play very 6 slash developer. Can add my friends in the public account, into the group of small partners exchange learning, a lot of dachang students are also in the group yo.