OOM appeared in the online program of operation and maintenance feedback last week, and the output in the program log is

Exception in thread "http-nio-8080-exec-1027" java.lang.OutOfMemoryError: Java heap space
Exception in thread "http-nio-8080-exec-1031" java.lang.OutOfMemoryError: Java heap space
Copy the code

See the name of the thread should be a tomcat nio worker threads, thread when the handler unable to allocate more memory in the heap OOM, fortunately, the JVM startup parameters configuration – XX: + HeapDumpOnOutOfMemoryError, using open MAT get hprof files for analysis.

The first step isto open Histogram and see what the largest memory hogs are:

max-http-header-size: 10000000
Copy the code

At this point, it’s pretty well established that the unreasonable maximum HTTP header parameter is probably the cause of the problem. Here are three more questions:

  1. Even if a request allocates 10M memory and the heap has 8GB, is there that much concurrency? 800 Tomcat threads?
  2. This parameter only sets the maximum buffer size for the first 10 MB. Why does Tomcat allocate such a large buffer at one time?
  3. Why are there so many Tomcat threads? It feels like there’s not that much concurrency in the program.

Let’s look at question 1 first, which can be continued to find the answer in dump through MAT. If you open the threads view and search for tomcat worker threads, you can see that the number of threads is quite large (401), but only half of 800:

Max-http-header-size is an argument defined by Springboot. If you look at the Springboot code, you can see that this parameter is set to MaxHttpHeaderSize for Tomcat:

   <attribute name="socket.appReadBufSize" required="false">
        <p>(int)Each connection that is opened up in Tomcat get associated with
        a read ByteBuffer. This attribute controls the size of this buffer. By
        default this read buffer is sized at <code>8192</code> bytes. For lower
        concurrency, you can increase this to buffer more data. For an extreme
        amount of keep alive connections, decrease this number or increase your
        heap size.</p>
      </attribute>
Copy the code

This is why we saw a lot of buffers that were 10008192 bytes. A buffer containing 10, 000 bytes is empty.

As for question 3, it is clear that our application is configured with a maximum number of threads (yes, we configured it to 2000, ok, a bit too large), otherwise there would not be 401 worker threads (150 by default), if the concurrency is not high at the time, it is possible, the request is slow, even though the concurrency is not large, However, because the request is executing slowly, more threads are required, such as TPS is 100, but the average RT is 4s, which is 400 threads. The answer to this question can be through the MAT to find, just look at a few threads can be found that many threads are waiting for an external service return, it shows that the external service is slow, to search the application log can be found that there are a lot of “feign. RetryableException: Read timed out log “… Hunt down the river! Slow down, our feign timeout needs to be set again, don’t be dragged to death by external services.