1. Opening Remarks

This was the first accident I dealt with when I joined the company, and also the first story I could tell in the company.

There is a report generation service on the project. After the developer encapsulates the report request into a task, it will be asynchronously generated in the thread pool. During the generation process, the status will be written to the database to support the generation of status query.

The generation process pulls data from databases and interfaces, populates the report template, and finally transforms it into the actual report (PDF). The content of the report is to draw an association graph. It defines the relationship of nodes and edges of the graph in advance, and gives the relation set to a three-party tool (YED, a powerful drawing tool, but I am not familiar with ~) to draw, and the story is caused by the third party tool library.

Yed is a very powerful plotting tool, but has a flaw that once plotting is started, it cannot be cancelled and does not respond to interrupts. This brings trouble to our drawing, when the number of relation nodes and edges is very large and there is an exception relationship, it will happen that the drawing enters an infinite loop, and it can be observed by JStack that a method is called in the loop all the time. According to top observation, CPU usage is close to 100%. Repeated submission of several same report tasks will cause each core of the server to be full, and eventually the service generated by the report will not respond to the report generation, and the report being generated will not have any progress information change.

Second, try to solve

After finding the problem, the project team called all members together to solve it. They tried to decomcompile the source code (Yed is a paid database, not open source) and analyze the data input, hoping to find anomalies and solve the problem by correcting the data input, but ultimately failed.

Iii. Final settlement

The main content of which is to say today, finally we expect to expand the thread pool, the monitoring work of threads in thread pool, simply monitoring, of course, when the worker thread begins to report generation and time exceeds a certain threshold (normal situation report basic can be done in 2 minutes, we set threshold to 5 minutes), directly over the report generation, A callback is then triggered and the data input is lowered by one degree (plotting data defaults to 4 degrees, with each degree reduced the data magnitude and the graph structure becomes simpler. Later, it was found that the graph with problems in degree 4 could be generated normally in degree 3), and then input again to generate again.

With this in mind, the thread pool is encapsulated as follows:

public class TraceThreadPoolExecutor extends ThreadPoolExecutor { private static final Logger LOG = LoggerFactory.getLogger(TraceThreadPoolExecutor.class); private static final int SHUTDOWN_TIMEOUT = 60; private final Map<Runnable, Thread> tracedThreads = new ConcurrentHashMap<>(); private final ExecutorService monitor; /** * Submit a task to the thread pool, and monitor the task execution time, if the threshold is exceeded, stop the task */ public Future<? > submit(Runnable job, int timeOut, TimeUnit timeUnit, TimeOutHandler timeOutHandler) { Future<? > futureTaskJob = submit(job); monitor.execute(new MonitorJob(job, futureTaskJob, timeOut, timeUnit, timeOutHandler)); return futureTaskJob; } public void stop() { shutdownExecutorQuietly(monitor); shutdownExecutorQuietly(this); } private void shutdownExecutorQuietly(ExecutorService executor) { if (null ! = executor) { try { executor.shutdown(); if (! executor.isShutdown()) { try { executor.awaitTermination(SHUTDOWN_TIMEOUT, TimeUnit.SECONDS); } catch (InterruptedException ignore) { LOG.warn(ignore.getMessage(), ignore); } if (! executor.isShutdown()) { executor.shutdownNow(); } } } catch (Exception ignore) { LOG.warn(ignore.getMessage(), ignore); }} @override protected void beforeExecute(Thread worker, Runnable job) {** * record the specific Thread of the task */ recordJobWorker(job, worker); } @override protected void afterExecute(Runnable job, Throwable error) { */ clearJobWorkerRecord(job); } private class MonitorJob implements Runnable { private final Future<? > futureTaskJob; private final Runnable originJob; private final int timeOut; private final TimeUnit timeUnit; private final TimeOutHandler timeOutHandler; public MonitorJob(Runnable originJob, Future<? > futureTaskJob, int timeOut, TimeUnit timeUnit, TimeOutHandler timeOutHandler) { this.originJob = originJob; this.futureTaskJob = futureTaskJob; this.timeOut = timeOut; this.timeUnit = timeUnit; this.timeOutHandler = timeOutHandler; } @Override public void run() { if (null ! = futureTaskJob) { try { futureTaskJob.get(timeOut, timeUnit); } catch (InterruptedException ignore) { LOG.warn(ignore.getMessage(), ignore); } catch (ExecutionException ignore) { LOG.warn(ignore.getMessage(), ignore); } catch (TimeoutException timeOut) { /** * Force stop worker */ Thread worker = getTracedThread((FutureTask) futureTaskJob); if (null ! = worker && worker.isAlive()) { worker.stop(); } /** * Do callback */ if (null ! = timeOutHandler) { timeOutHandler.doHandle(originJob); } } } } } private void recordJobWorker(Runnable job, Thread worker) { tracedThreads.put(job, worker); } private Thread getTracedThread(Runnable job) { return tracedThreads.get(job); } private void clearJobWorkerRecord(Runnable job) { tracedThreads.remove(job); } private TraceThreadPoolExecutor(int workerThreads, int monitorThreads) { super(workerThreads, workerThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<>()); monitor = Executors.newFixedThreadPool(monitorThreads); } public static TraceThreadPoolExecutor createFixedThreadPool(int workerThreads) { return new TraceThreadPoolExecutor(workerThreads, workerThreads); }}Copy the code

Put a test chestnut:

public class TraceThreadPoolExecutorTutorial { public static void main(String[] args) throws InterruptedException { TraceThreadPoolExecutor.createFixedThreadPool(1).submit(() -> { try { Thread.sleep(60000); } catch (InterruptedException ignore) { } }, 10, TimeUnit.SECONDS, job -> System.out.println("TimeoutHandler execute!" )); Thread.sleep(20000); System.out.println("Main thread exit!" ); }} -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- the result output -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- TimeoutHandler execute! Main thread exit!Copy the code

One question that might arise here is, is there a problem with using stop in your code to terminate the execution of a thread? Of course there is, but on a case-by-case basis!

Let’s start by understanding how thread. stop raises a ThreadDeath error and releases any held monitor locks. So if a business relies on a lock to process business, and the business process is in an inconsistent state in the middle, the thread is stopped so that other threads can acquire the lock and continue the business process, obviously this situation can lead to some bizarre problems.

But none of these problems exist in our story. To recap our questions:

  1. The worker thread calls the Yed library drawing and for some reason enters an infinite loop that does not respond to interrupts;
  2. The reporting thread has nothing to synchronize with other threads.

So Thread.stop is safe for us, and probably our only option.

Finally, the business side accepted our plan, and the online operation was good after adjustment, and our problem was solved.

Write at the end

The case in this paper has strong particularity. Firstly, the worker Thread executes a circular task with no exit, and secondly, the task cannot be cancelled and does not respond to interruption. In this context, we choose the solution described above, which also depends on the understanding of the working principle of Thread pool and Thread.stop method.

If it’s helpful, please follow, bookmark and like it. Individual understanding is limited, inaccurate place welcome correction, thank!

End.