Locating and solving the problem that the number of online service threads is too high

Problem phenomenon:

On the afternoon of March 21, 2021, there was a problem with the online API service of Furen Station, which could not provide external services, resulting in the interruption of online equipment and related business processes

Temporary solution:

To prevent service interruption for a long time, restart the service

Problem location:

Observe the number of threads and memory changes in the server for several days after March 21.

pstree -p > threadNum.txt
Copy the code

PS: Because there are too many online services, it is easy to brush the screen when you execute commands directly on the console, so you can directly print the thread tree into the text for investigation

Select * from thread tree; select * from thread tree; select * from thread tree;

ps -ef | grep java   
Copy the code

The way to understand the specific number of threads excessive services.

3. After locating the faulty service, run the following command to enter the container

docker exec -it container_name /bin/bash
Copy the code

4. After entering the service container, run the jstat command to check the memory GC status of the current service to find out whether the OOM problem occurs because the service GC is too high

jstat -gcutil pid 10000 30
Copy the code

After the above command is executed, observe the current GC status, where:

S0, S1, and E represent the proportion of young generations in memory

O is the percentage of older years in memory

YGC represents the number of GC occurrences in the young generation

YGCT represents the total time of GC occurrence in the young generation

FGC represents the number of full GC occurrences

FGCT represents the total time taken for full GC to occur

When the NUMBER of FGC is too high and the time is too long, the possibility of service interruption is higher, but the service does not have the situation of high FGC through the thread view

No major problems have been found through Jstat, so you need to retrieve the service’s memory stack for analysis. Memory stack analysis can be performed using jMAP or MemoryAnalysis tools. First we export the memory stack:

jmap -dump:live,format=b,file=heap.bin pid
Copy the code

or

jmap -dump:live,format=b,file=heap.hprof pid
Copy the code

We use the second method here, using the memory analysis tool, to make it clearer

By analyzing the memory stack, we can only see that the problem occurred in the Thread pool, and the top3 objects are Thread, String and LinkedBlokingQueue. Then we can initially locate the problem. We continue to view the Thread of the specific object information, and found that there are a lot of things on the delay task, this is part of the code we write their own code, the others are all third-party framework, we temporarily don’t pay attention to the third party first whether the framework has a problem, first found in our own code below we marked the object code

We looked at the code and found that there was only one place to create a thread pool

And the creation of the thread pool is still inside the singleton method, so this should not be the cause of too many threads.

7, at this point we seem to have broken the only clue, then we again through the command jStack JAVA thread stack

jstack -l pid 
Copy the code

We found that there were a lot of threads in WAITING state. Combined with the analysis of the above memory stack, we suspected that the problem was caused by improper use of thread pool creation. Therefore, we guessed the direction of incorrect use of thread pool and verified it. I wrote the following code:

public class Test {
    public static void main(String[] args) {
        for (int i = 0; i < 10000; i++) {
            ExecutorService service = new ThreadPoolExecutor(10.10.60, TimeUnit.SECONDS, new LinkedBlockingQueue<>(10), new RejectedExecutionHandler() {
                @Override
                public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
                    System.out.println("No new assignment."); }}); Thread thread =new Thread(new Runnable() {
                @Override
                public void run(a) {
                    System.out.println("test"); }}); service.submit(thread); }}}Copy the code

The code is then executed to query the PID of the process through the JPS command

The current thread stack information is then obtained using the jstack command

We found a large number of WAITING threads at this point, consistent with the production environment. At this point we confirm that the thread count is too high due to improper use of our thread pool.

**new ThreadPoolExecutor(** new ThreadPoolExecutor) We do have a lot of places in our code to display the New ThreadPoolExecutor object, and each request renews an object, so there will be more and more objects over time.

The solution

A global thread pool tool can be used for global service calls:

package com.frznkj.common.utils;

import lombok.extern.slf4j.Slf4j;

import java.util.concurrent.*;

/** * Service thread pool * When the system is started, it has three thread pools. Other services directly use this thread pool **@author wanggc
 * @version 1.0.0
 * @email [email protected]
 * @dateThe 2021-03-19 cuthah * * /
@Slf4j
public class ThreadPoolUtil {

    /** * create an unbounded thread pool with a maximum concurrency of 150 threads */
    private final static ExecutorService unboundedThreadPool = new ThreadPoolExecutor(50.50.120, TimeUnit.SECONDS, new LinkedBlockingQueue<>());

    /** * This thread pool has a maximum of 50 threads to execute synchronously. In business processing, slow but small tasks can be placed in this thread pool to execute *@return* /
    public static ExecutorService getUnboundedThreadPool(a) {
        return unboundedThreadPool;
    }

    /** * create a bounded program pool with maximum concurrency of 150 threads */
    private final static ExecutorService boundedThreadPool = new ThreadPoolExecutor(50.150.120, TimeUnit.SECONDS, new LinkedBlockingQueue<>(2048), (r, executor) -> log.error("Service wait queue exceeded limit, current task dropped"));

    /** * This thread pool can have up to 150 concurrent threads in which light business functions can be executed *@return* /
    public static ExecutorService getBoundedThreadPool(a) {
        return boundedThreadPool;
    }

    /** * create a cache thread pool */
    private final static ExecutorService cacheExecutor = Executors.newCachedThreadPool();

    /** * This thread pool can have an infinite number of threads, which change dynamically according to the business * light business can be executed in this pool *@return* /
    public static ExecutorService getCacheThreadPool(a) {
        returncacheExecutor; }}Copy the code