background
The misfire problem about Quartz was encountered in the company. The task that should have been triggered at 04:32 was not triggered, but it was a very important task. Fortunately, emergency cover was left in the design before, so as to avoid major problems. But back to the question itself, why did 04:32 task not trigger with Quartz? Is there a bug in Quartz? This is basically impossible, encounter a problem from their own to find the reason, and listen to me
Log analysis
Through log analysis, it was found that the number of tasks triggered between 04:30 and 05:45 was much smaller than that in other periods, and the time consuming of the database increased significantly during this 75 minutes, which eventually led to the accumulation of tasks and misfire. Later, the DBA learned that this 75 minutes was exactly the same as a certain database task, and speculated that this was the root cause.
configuration
Although the root cause is not the application, the application should be fairly fault-tolerant to these special cases and not produce unexpected results due to slow database response. The problem lies in misfire. The tolerance time of misfire is 60 seconds in our program configuration, and the misfire policy is DO_NOTHING, which means that the task with a delay of more than 60 seconds is identified as misfire, and it directly ignores and does nothing, waiting for the next trigger. This is why the original 04:32 trigger was ignored.
misfire
Different triggers have different Trigger policies. We only used CronTrigger, so we will only talk about CronTrigger’s three misfire policies
Misfire strategy | meaning |
---|---|
DO_NOTHING | Misfire, wait for the next trigger, which is not a big deal for tasks that are triggered frequently, but for tasks that are triggered at very large intervals or that are triggered less than once that would make a difference,Not in this way |
IGNORE_MISFIRE_POLICY | Ignoring the processing of misfire, it still triggers according to the missed frequency. For example, if it has missed three times, it will trigger the three times as long as it meets the conditions |
FIRE_ONCE_NOW SMART_POLICY (default) | This is the default policy. If you miss 3 triggers, it willReplenish a trigger immediately(subject to conditions such as idle threads, of course), and then at the scheduled firing frequency,Applies to most scenarios, including the questions in this article |
For the problem in this article, changing the misfire policy of the Trigger to FIRE_ONCE_NOW and increasing the priority should solve the problem, at least without skipping the task that should have been triggered
concurrent
When refer to official document, and read about the content of the concurrent, the default Job class is to allow concurrent, by adding @ DisallowConcurrentExecution can ban concurrent Job instance, the annotations are added in the Job on the class, but a Job A class is not equal to a Job instance. A Job class can generate multiple Jobdetails, and each JobDetail is a Job instance. This is a mistake.
The plug-in
Quartz provides a lot of hooks and plug-ins to expand the functions, which are basically invisible in Baidu. It is strongly recommended that you read the chapter about plug-ins in the official document, where the official TriggerHistory plug-in is introduced to facilitate tracking and troubleshooting of Trigger problems
org.quartz.plugin.triggHistory.class = org.quartz.plugins.history.LoggingTriggerHistoryPlugin
Copy the code
With this plug-in, logs will be printed for each fire, complete and misfire for easy investigation
Trigger groupName.sleepJobTrigger fired job groupName.sleepJob at: 16:48:20 07/25/2021
Trigger groupName.sleepJobTrigger completed firing job groupName.sleepJob at 16:48:22 07/25/2021 with resulting trigger instruction code: DO NOTHING
Trigger groupName.fireJobTrigger misfired job groupName.fireJob at: 16:48:22 07/25/2021. Should have fired at: 16:48:17 07/25/2021
Copy the code