Recover from trigger ERROR state after Job constructor threw an exception? - quartz.net

When using Quartz.net to schedule jobs, I occasionally receive an exception when instantiating a job. This, in turn causes Quartz to set the trigger for the job to an error state. When this occurs, the trigger will cease firing until some manual intervention occurs (restarting the service since I'm using in-memory job scheduling).
How can I prevent the error state from being set, or at the very least, tell Quartz to retry triggers that are in the error state?
The reason for the exception is due to flaky network calls that are required to get configuration data that is passed in to the job's constructor. I'm using a custom IJobFactory to do this.
I've seen other references to this without resolutions:
https://groups.google.com/forum/#!topic/quartznet/8qaT70jfJPw
http://forums.terracotta.org/forums/posts/list/2881.page

For the record, I consider this a design flaw of Quartz. If a job can't be constructed once, that doesn't mean it can't always be constructed. This is a transient error and should be treated as such. Stopping all future scheduled jobs violates the principle of least astonishment.
Anyway, my hack solution is to catch any errors that are the result of my job construction and instead of throwing an error or returning null to return a custom IJob instead that simply logs an error. This isn't perfect, but at least it doesn't prevent future triggering of the job.
public IJob NewJob(TriggerFiredBundle bundle, IScheduler scheduler)
{
try
{
var job = this.container.Resolve(bundle.JobDetail.JobType) as IJob;
return job;
}
catch (Exception ex)
{
this.logger.Error(ex, "Exception creating job. Giving up and returning a do-nothing logging job.");
return new LoggingJob(this.logger);
}
}

When exception occurs on trigger instatiating IJob class, then trigger change it TRIGGER_STATE to ERROR, and then trigger in this state will no longer fire.To reenable trigger your need to change it state to WAITING, and then it could to fire again.
Here the example how your can reenable yours misfired trigger.
var trigerKey = new TriggerKey("trigerKey", "trigerGroup");
if (scheduler.GetTriggerState(trigerKey) == TriggerState.Error)
{
scheduler.ResumeTrigger(trigerKey);
}

Actually the best way to reset Trigger from ERROR state is:
private final SchedulerFactoryBean schedulerFactoryBean;
Scheduler scheduler = schedulerFactoryBean.getScheduler();
TriggerKey triggerKey = TriggerKey.triggerKey(triggerName, triggerGroup);
if (scheduler.getTriggerState(triggerKey).equals(Trigger.TriggerState.ERROR)) {
scheduler.resetTriggerFromErrorState(triggerKey);
}
Note:
You should never modify the records in a table from a third-party library or software manually. All changes should be made through the API to that library if there is any functionality.
JobStoreSupport.resetTriggerFromErrorState

How can I prevent the error state from being set, or at the very least, tell Quartz to retry triggers that are in the error state?
Unfortunately, in current version, you cannot retry those triggers. As per the documentation of Quartz,
It should be extremely rare for this method to throw an exception -
basically only the case where there is no way at all to instantiate
and prepare the Job for execution. When the exception is thrown, the
Scheduler will move all triggers associated with the Job into the state, which will require human
intervention (e.g. an application restart after fixing whatever
configuration problem led to the issue with instantiating the Job).

Simply put, you should follow good object oriented practices: constructors should not throw exceptions. Try to move pulling of configuration data to job's execution phase (Execute method) where retries will be handled correctly. This might mean providing a service/func via constructor that allows pulling the data.

To change the trigger state to WAITING the author also suggests that a way could be to manually update the database.
[...] You might need to update database manually, but yeah - if jobs cannot be instantiated it's considered quite bad thing and Quartz will flag them as broken.
I created another job scheduled at app startup that updates the triggers in error state to recover them.
UPDATE QRTZ_TRIGGERS SET [TRIGGER_STATE] = 'WAITING' WHERE [TRIGGER_STATE] = 'ERROR'
More information in this github discussion.

Related

How to restore runOn Scheduler used in previous operator?

Folks, is it possible to obtain currently used Scheduler within an operator?
The problem that I have is that Mono.fromFuture() is being executed on a native thread (AWS CRT Http Client in my case). As result all subsequent operators are also executed on that thread. And later code wants to obtain class loader context that is obviously null. I realize that I can call .publishOn(originalScheduler) after .fromFuture() but I don't know what scheduler is used to materialize Mono returned by my function.
Is there elegant way to deal with this?
fun myFunction(): Mono<String> {
return Mono.just("example")
.flatMap { value ->
Mono.fromFuture {
// invocation of 3rd party library that executes Future on the thread created in native code.
}
}
.map {
val resource = Thread.currentThread().getContextClassLoader().getResources("META-INF/services/blah_blah");
// NullPointerException because Thread.currentThread().getContextClassLoader() returns NULL
resource.asSequence().first().toString()
}
}
It is not possible, because there's no guarantee that there is a Scheduler at all.
The place where the subscription is made and the data starts flowing could simply be a Thread. There is no mechanism in Java that allows an external actor to submit a task to an arbitrary thread (you have to provide the Runnable at Thread construction).
So no, there's no way of "returning to the previous Scheduler".
Usually, this shouldn't be an issue at all. If your your code is reactive it should also be non-blocking and thus able to "share" whichever thread it currently runs on with other computations.
If your code is blocking, it should off-load the work to a blocking-compatible Scheduler anyway, which you should explicitly chose. Typically: publishOn(Schedulers.boundedElastic()). This is also true for CPU-intensive tasks btw.

Dataflow/Apache Beam: How can I access state from #FinishBundle?

Our pipeline buffers events and does an external fetch (for enrichment) every 500 events. When a timer is fired, these events are then processed when a timer fires. Of course, when you have e.g. 503 events, there will be 3 events that were not enriched.
From experiments we learned that #FinishBundle is always called before the timer. It even seems the result of the bundle in committed before the timer executed (checkpointing?). If we could access the state from #FinishBundle and perform an enrichment on these last events, they would be part of the committed bundle.
I believe this would solve our exactly-once problem: currently the timer also needs to fetch and will do this again upon re-execution. When we could adjust the state in the #FinishBundle the fetch is no longer needed and when the timer re-executes it will start from the state.
Apparently, it is not possible to access the state from the #FinishBundle function, as the following gives errors:
#FinishBundle
public void finishBundle(FinishBundleContext c,
#StateId("buffer") BagState<AwesomeEvent> bufferState) {
LOG.info("--- FINISHBUNDLE CALLED ---");
// TODO: ENRICHMENT
LOG.info("--- FINISHBUNDLE DONE ---");
}
Am I doing something wrong or is there another way of accessing the state from this function?
Also, am I making the correct assessment about the timer behavior?

Avoid duplicate tasks in ManagedScheduledExecutorService

I'm developing a Java EE 7 application on wildfly 8.2 and need to run a periodic background task. I inject an executor service and schedule a task, this part is working fine:
#Resource
private ManagedScheduledExecutorService executorService;
...
executorService.scheduleWithFixedDelay(() -> {
try {
// do some stuff
} catch (Throwable t) {
log.error("Error", t);
}
}, 0, 1, TimeUnit.MINUTES);
Now the (actually nice) feature is that upon redeploy the scheduled task is saved and therefore is still scheduled in the new deployment.
But how can I detect if the task is already scheduled to avoid scheduling it multiple times?
I tried to use a ScheduledFutureand cancel the task on #PreDestroy
and #PrePassivate
reloadTreeFuture = executorService.scheduleWithFixedDelay(() -> {
...
#PreDestroy
#PrePassivate
protected void shutdown() {
reloadTreeFuture.cancel(true);
}
This is working fine as long as the corresponding task is not executing at the very moment the cancel is fired. Since the task is long running and running frequently the chance of hitting it in the middle of an execution is somewhat high.
If the cancel is fired while the task is still executing the cancel seems to do nothing. It immediatly returns and the method ScheduledFuture.isDone() also returns true but from the logs I can see the task is still executing in the background until it hits a point where it needs an injected Bean which is not available due to the undeployment process. The process then ends with org.jboss.msc.service.ServiceNotFoundException - but is still scheduled.
reloadTreeFuture.cancel(true);
while (!reloadTreeFuture.isDone()) {
Thread.sleep(200); // I know this is bad - it's just for testing
}
So basic question: how can I make sure the task is not scheduled twice (or even more)?
You could write down the id of each task and check them before executing. Of course this may be not a best solution, but it works.

Gracefully stopping a build step (plugin) on build abort

I have a plugin in Jenkins that operates a remote server via rest.
How can I send one last request to the server on build abort? Thus, gracefully finishing the work of the plugin?
The only reference to the 'Abort sequence' that I found is this.
Which makes me think that the procedure is very rough, and that I can't catch the signal before it terminates the child (my plugin).
I have a similar need, which I solved with the PostBuildScript Plugin.
I choose to run a build step, but you can run several other options. Very easy to use.
I have a script I must run regardless of how the build ended (success, fail or abort). It works great for me.
I hope this helps.
When the build is aborted, an InterruptedException is thrown. You can catch it like any other exception, and not only gracefully exit, but prevent the abort, if you so wish.
In your boolean perform() method, set up a
try {
... // Whatever your plugin does
}
catch (InterruptedException e) {
// Code to handle build being aborted.
}
statement, which will handle any aborts that occur while your build step is being run. In the catch statement you can then throw an InterruptedException again to continue aborting the build, or just return and let the build continue.

Polly show dialog after retry count reached

I'm using Polly to retry web service calls in case the call fails with WebException, because I want to make sure the method executed correctly before proceeding. However sometimes web methods still throw exception even after retrying several times and I don't want to retry forever. Can I use Polly to show some confirmation dialog, e.g. "Max retry count reached! Make sure connection is enabled and press retry." Then retry counter should reset to initial value and start again. Can I achieve this using only Polly or should I write my own logic? Ideas?
Polly has nothing in-built to manage dialog boxes as it is entirely agnostic to the context in which it is used. However, you can customise extra behaviour on retries with an onRetry delegate so you can hook a dialog box in there. Overall:
Use an outer RetryForever policy, and display the dialog box in the onRetry action configured on that policy.
If you want a way for the user to exit the RetryForever, a cancel action in the dialog could throw some other exception (which you trap with a try-catch round all the policies), to cause an exit.
Within the outer policy, use an inner Retry policy for however many tries you want to make without intervention.
Because this is a different policy instance from the retryforever, and has fixed retry count, the retry count will automatically start afresh each time it is executed.
Use PolicyWrap to wrap the two retry policies together.
In pseudo-code:
var retryUntilSucceedsOrUserCancels = Policy
.Handle<WhateverException>()
.RetryForever(onRetry: { /* show my dialog box*/ });
var retryNTimesWithoutUserIntervention = Policy
.Handle<WhateverException>()
.Retry(n); // or whatever more sophisticated retry style you want
var combined = retryUntilSucceedsOrUserCancels
.Wrap(retryNTimesWithoutUserIntervention);
combined.Execute( /* my work */ );
Of course the use of the outer RetryForever() policy is just an option: you could also build the equivalent manually.

Resources