"Operator called default onErrorDropped" on Mono timeout - project-reactor

In my Production code, I am getting errors in my logs when a Mono times out.
I have managed to recreate these errors with the following code:
#Test
public void testScheduler() {
Mono<String> callableMethod1 = callableMethod();
callableMethod1.block();
Mono<String> callableMethod2 = callableMethod();
callableMethod2.block();
}
private Mono<String> callableMethod() {
return Mono.fromCallable(() -> {
Thread.sleep(60);
return "Success";
})
.subscribeOn(Schedulers.elastic())
.timeout(Duration.ofMillis(50))
.onErrorResume(throwable -> Mono.just("Timeout"));
}
In the Mono.fromCallable I am making a blocking call using a third-party library. When this call times out, I get errors similar to
reactor.core.publisher.Operators - Operator called default onErrorDropped
reactor.core.publisher.Operators - Scheduler worker in group main failed with an uncaught exception
These errors also seem to be intermittent, sometimes when I run the code provided I get no errors at all. However when I repeat the call in a loop of say 10, I consistently get them.

Question: Why does this error happen?
Answer:
When the duration given to the timeout() operator has passed, it throws a TimeoutException. That results in the following outcomes:
An onError signal is sent to the main reactive chain. As a result, the main execution is resumed and the process moves on (i.e., onErrorResume() is executed).
Shortly after outcome #1, the async task defined within fromCallable() is interrupted, which triggers a 2nd exception (InterruptedException). The main reactive chain can no longer handle this InterruptedException because the TimeoutException happened first and already caused the main reactive chain to resume (Note: this behavior of not generating a 2nd onError signal conforms with the Reactive Stream Specification -> Publisher #7).
Since the 2nd exception (InterruptedException) can't be handled gracefully by the main chain, Reactor logs it at error level to let us know an unexpected exception occurred.
Question: How do I get rid of them?
Short Answer: Use Hooks.onErrorDropped() to change the log level:
Logger logger = Logger.getLogger(this.getClass().getName());
#Test
public void test() {
Hooks.onErrorDropped(error -> {
logger.log(Level.WARNING, "Exception happened:", error);
});
Mono.fromCallable(() -> {
Thread.sleep(60);
return "Success";
})
.subscribeOn(Schedulers.elastic())
.timeout(Duration.ofMillis(50))
.onErrorResume(throwable -> Mono.just("Timeout"))
.doOnSuccess(result -> logger.info("Result: " + result))
.block();
}
Long Answer: If your use-case allows, you could handle the exception happening within fromCallable() so that the only exception affecting the main chain is the TimeoutException. In that case, the onErrorDropped() wouldn't happen in the first place.
#Test
public void test() {
Mono.fromCallable(() -> {
try {
Thread.sleep(60);
} catch (InterruptedException ex) {
//release resources, rollback actions, etc
logger.log(Level.WARNING, "Something went wrong...", ex);
}
return "Success";
})
.subscribeOn(Schedulers.elastic())
.timeout(Duration.ofMillis(50))
.onErrorResume(throwable -> Mono.just("Timeout"))
.doOnSuccess(result -> logger.info("Result: " + result))
.block();
}
Extra References:
https://tacogrammer.com/onerrordropped-explained/
https://medium.com/#kalpads/configuring-timeouts-in-spring-reactive-webclient-4bc5faf56411

Related

Why does StepVerifier.Step.expectNoEvent not fail?

When a flux has a delayElements operator, the StepVerifier.Step.expectNoEvent correctly throws an AssertionError if an event is emitted before the duration.
Consider the following test:
#Test
public void testWithDelayElements() {
StepVerifier.withVirtualTime(() -> Flux.just(1, 2, 3).delayElements(Duration.ofSeconds(1)))
.expectSubscription()
.expectNoEvent(Duration.ofSeconds(2))
.expectNext(1)
.thenCancel()
.verify();
}
which throws
expectation failed (expected no event: onNext(1))
java.lang.AssertionError:...
However, if the flux doesn't have a delayElements operator, then the test passes.
#Test
public void testWithoutDelayElements() {
StepVerifier.withVirtualTime(() -> Flux.just(1, 2, 3))
.expectSubscription()
.expectNoEvent(Duration.ofSeconds(2))
.expectNext(1)
.thenCancel()
.verify();
}
I don't understand why this passes?
Edit:
I see there is a similar GitHub issue which seems to explain the blocking behavior of expectNoEvent in one of the comments. However, that seems to apply when virtual time scheduler is not being used. I am using the virtual time schedular, so the thread should not be blocked, right?

Sharing Mono with the publish method doesn't work as expected

I have two service calls. The second one accepts a value that the first returns. I need to return the result of the first call only if the second succeeds. The following is my prototype implementation, however, the resulting mono is always empty. Please explain why it doesn't work and how to implement it the proper way.
#Test
public void testPublish() {
callToService1().publish(
mono -> mono.flatMap(resultOfCall1 -> callToService2(resultOfCall1))
.then(mono)
)
.map(Integer::valueOf)
.as(StepVerifier::create)
.expectNext(1)
.verifyComplete();
}
Mono<String> callToService1() {
return Mono.just("1");
}
Mono<Integer> callToService2(String value) {
// parameter that used in a call to service2
return Mono.empty();
}
Not sure why you used publish(Function). Sounds like your requirement would be fulfilled by a simple direct flatMap:
callToService1()
.flatMap(v1 -> callToService2(v1)
.thenReturn(v1)
);
if callToService2 throws or produces an onError, that error will be propagated to the main sequence, terminating it.
(edited below for requirement of emitting value from service1)
otherwise, inside the flatMap the callToService2 is completed then we ignore the result and emit the still in scope v1 value thanks to thenReturn (which also propagates onError if callToService2 emits onError)

Java Reactor inapporiate blocking method inside Mono flatMap method

This part of code inside mono.flatMap(() -> ()) is giving me an error(or warning?) verifyClient.sms(request.phoneNumber(). Error is "Inappropriate blocking method call". I guess SMS call is a blocking call from a third party(TeleSign) sdk.
Error is shown in the picture.
#Override
public Mono<ResponseEntity<SuccessResponse>> postSms(
Mono<SendSmsDetail> sendSmsDetail, ServerWebExchange exchange) {
return sendSmsDetail.doOnNext(this::validate)
.flatMap(request -> {
try {
return Mono.just(verifyClient.sms(request.phoneNumber(),
buildSmsParam(request)));
} catch (Exception e) {
return Mono.error(new RuntimeException("Fail to verify", e));
}
})
.onErrorResume(this::defaultOrderErrorHandler);
}
Screenshot of the error:
Can someone please tell me how to resolve it? Just started using Reactor. By the way, if you notice the screenshot and the actual pasted code has differences on .flatMap(tlr -> tlr.) part. Code currently won't compile due to different return type. I am also trying to make it compile by returning Mono<ResponseEntity<SuccessResponse>>. That's what I am trying to do with the second flatMap. Change "tlr"(TeleSignResponse) to my own "SuccessResponse".
I may need a second post on how to make this compile.

Project Reactor + flatMap + Multiple onErrorComplete - Not working as expected

When multiple onErrorContinue added to the pipeline to handle specific type of exception thrown from flatMap, the exception handling is not working as expected.
The below code, I expect, the elements 1 to 6 should be dropped and element 7 to 10 should be consumed by the subscriber.
public class FlatMapOnErrorContinueExample {
public static void main(String[] args) {
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.flatMap(number -> {
if (number <= 3) {
return Mono.error(new NumberLesserThanThree("Number is lesser than 3"));
} else if (number > 3 && number <= 6) {
return Mono.error(new NumberLesserThanSixButGretherThan3("Number is grether than 6"));
} else {
return Mono.just(number);
}
})
.onErrorContinue(NumberLesserThanThree.class,
(throwable, object) -> System.err.println("Exception: Dropping the element because it is lesser than 3"))
.onErrorContinue(NumberLesserThanSixButGretherThan3.class,
(throwable, object) -> System.err.println("Exception: Dropping the element because it is lesser than 6 but grether than 3"))
.onErrorContinue((throwable, object) ->
System.err.println("Exception: " + throwable.getMessage()))
.subscribe(number -> System.out.println("number is " + number),
error -> System.err.println("Exception in Subscription " + error.getMessage()));
}
public static class NumberLesserThanThree extends RuntimeException {
public NumberLesserThanThree(final String msg) {
super(msg);
}
}
public static class NumberLesserThanSixButGretherThan3 extends RuntimeException {
public NumberLesserThanSixButGretherThan3(final String msg) {
super(msg);
}
}
}
Here is the output what I am getting:
Exception: Dropping the element because it is lesser than 3
Exception: Dropping the element because it is lesser than 3
Exception: Dropping the element because it is lesser than 3
Exception in Subscription Number is grether than 6
Question: Why the 2nd onErrorContinue is not called but the exception send to subscriber?
Additional Note:
if i remove 1st and 2nd onErrorContinue, then all exception are handled by 3rd onErrorContinue. I could use this approach to receive all exception and check for the type of exception and proceed with handling. However, I would like to make it cleaner exception handling rather than adding if..else block.
How this question is different from Why does Thread.sleep() trigger the subscription to Flux.interval()?
1) This question about exception handling and the order of exception handling; The other question is about processing elements in parallel and making the main thread waiting for the all the element processing complete
3) This question dont have any concern about threading, even if add Thread.sleep(10000) after . subscribe, there is no change in behaviour.
This again comes down to the unusual behaviour of onErrorContinue. It breaks the rule in that it doesn't "catch" errors and then change the behaviour downstream as a result, it actually allows supporting operators to "look ahead" at it and behave accordingly, thus changing the result upstream.
This is weird, and leads to some behaviour that's not immediately obvious, such as is the case here. As far as I'm aware, all supporting operators only look ahead to the next onErrorContinue operator, rather than recursively searching ahead for all such operators. Instead, they will evaluate the predicate of the next onErrorContinue (in this case whether it's of a certain type), and then behave accordingly - either invoking the handler if the predicate returns true, or throwing the error downstream if not. (There's no case where it will then move onto the next onErrorContinue operator, then the next, until a predicate is matched.)
Clearly this is a contrived example - but because of these idiosyncrasies, I'd almost always recommend avoiding onErrorContinue. There's two "normal" ways that can happen where flatMap() is involved:
If flatMap() has an "inner reactive chain" in it, that is it calls another method or series of methods that return a publisher - then just use onErrorResume() at the end of the flatMap() call to handle those errors. You can chain onErrorResume() since it works with downstream, not upstream operators. This is by far the most common case.
If flatMap() is an imperative collection of if / else that's returning different publishers such as it is here and you want to / have to keep the imperative style, throw exceptions instead of using Mono.error(), and catch as appropriate, returning Mono.empty() in case of an error:
.flatMap(number -> {
try {
if (number <= 3) {
throw new NumberLessThanThree();
} else if (number <= 6) {
throw new NumberLessThanSixButGreaterThan3();
} else {
return Mono.just(number);
}
}
catch(NumberLessThanThree ex) {
//Handle it
return Mono.empty();
}
catch(NumberLessThanSixButGreaterThan3 ex) {
//As above
}
})
In general, using one of these two approaches will make it much easier to reason about what's going on.
(For the sake of completeness after reading the comments - this isn't anything to do with the reactive chain being unable to complete before the main thread exits.)

Not caching error signals in Mono.cache()

Hello good reactor folks - I trying to write some reactive code (surprising eh?) and have hit a slight snag. I think it might be a reactor bug, but thought I'd ask here first before posting a bug.
For context: I have a cache Map<Key, Mono<Value>>. A client will request data - we check the cache and use what is essentially computeIfAbsent to place a Mono with .cache() into the cache if nothing has yet been cached for that key. The client then takes the Mono and does magic (not relevant here). Now, the catch is that the population of the cache may encounter transient errors, so we don't want to cache errors - the current request will error but the "next" client, when it subscribes, should trigger the entire pipeline to rerun.
Having read around, for example this closed issue, I settled on Mono#cache(ttlForValue, ttlForError, ttlForEmpty).
This is where things get interesting.
As I don't want to cache error (or empty, but ignore that) signals I found the following documentation promising
If the relevant TTL generator throws any Exception, that exception will be propagated to the Subscriber that encountered the cache miss, but the cache will be immediately cleared, so further Subscribers might re-populate the cache in case the error was transient. In case the source was emitting an error, that error is dropped and added as a suppressed exception. In case the source was emitting a value, that value is dropped.
emphasis mine
So I tried the following (shamelessly cribbing the example in the linked GitHub issue)
public class TestBench {
public static void main(String[] args) throws Exception {
var sampleService = new SampleService();
var producer = Mono.fromSupplier(sampleService::call).cache(
__ -> Duration.ofHours(24),
//don't cache errors
e -> {throw Exceptions.propagate(e);},
//meh
() -> {throw new RuntimeException();});
try {
producer.block();
} catch (RuntimeException e) {
System.out.println("Caught exception : " + e);
}
sampleService.serverAvailable = true;
var result = producer.block();
System.out.println(result);
}
static final class SampleService {
volatile boolean serverAvailable = false;
String call() {
System.out.println("Calling service with availability: " + serverAvailable);
if (!serverAvailable) throw new RuntimeException("Error");
return "Success";
}
}
}
Output
09:12:23.991 [main] DEBUG reactor.util.Loggers$LoggerFactory - Using Slf4j logging framework
Calling service with availability: false
09:12:24.034 [main] ERROR reactor.core.publisher.Operators - Operator called default onErrorDropped
java.lang.RuntimeException: Error
at uk.co.borismorris.testbench.TestBench$SampleService.call(TestBench.java:40)
at reactor.core.publisher.MonoSupplier.subscribe(MonoSupplier.java:56)
at reactor.core.publisher.MonoCacheTime.subscribe(MonoCacheTime.java:123)
at reactor.core.publisher.Mono.block(Mono.java:1474)
at uk.co.borismorris..boris.testbench.TestBench.main(TestBench.java:26)
Caught exception : reactor.core.Exceptions$BubblingException: java.lang.RuntimeException: Error
Exception in thread "main" java.lang.RuntimeException: Error
at uk.co.borismorris.testbench.TestBench$SampleService.call(TestBench.java:40)
at reactor.core.publisher.MonoSupplier.subscribe(MonoSupplier.java:56)
at reactor.core.publisher.MonoCacheTime.subscribe(MonoCacheTime.java:123)
at reactor.core.publisher.Mono.block(Mono.java:1474)
at uk.co.borismorris.testbench.TestBench.main(TestBench.java:26)
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:93)
at reactor.core.publisher.Mono.block(Mono.java:1475)
at uk.co.borismorris.testbench.TestBench.main(TestBench.java:31)
Well, that didn't work - the error is cached and the second subscriber just sees the same error.
Looking at the code the cause is obvious
Duration ttl = null;
try {
ttl = main.ttlGenerator.apply(signal);
}
catch (Throwable generatorError) {
signalToPropagate = Signal.error(generatorError);
STATE.set(main, signalToPropagate); //HERE
if (signal.isOnError()) {
//noinspection ThrowableNotThrown
Exceptions.addSuppressed(generatorError, signal.getThrowable());
}
}
The STATE is set to the error signal, not cleared at all. But this isn't the whole story,
the reason for the code not clearing the cache is below this block
if (ttl != null) {
main.clock.schedule(main, ttl.toMillis(), TimeUnit.MILLISECONDS);
}
else {
//error during TTL generation, signal != updatedSignal, aka dropped
if (signal.isOnNext()) {
Operators.onNextDropped(signal.get(), currentContext());
}
else if (signal.isOnError()) {
Operators.onErrorDropped(signal.getThrowable(), currentContext());
}
//immediate cache clear
main.run();
}
In this case ttl == null because the generation of the ttl threw an Exception. The signal is an error so that branch is entered and Operators.onErrorDropped is called
public static void onErrorDropped(Throwable e, Context context) {
Consumer<? super Throwable> hook = context.getOrDefault(Hooks.KEY_ON_ERROR_DROPPED,null);
if (hook == null) {
hook = Hooks.onErrorDroppedHook;
}
if (hook == null) {
log.error("Operator called default onErrorDropped", e);
throw Exceptions.bubble(e);
}
hook.accept(e);
}
So here we can see that if there is no onError hook in the context and no default set then throw Exceptions.bubble(e) is called and the code in MonoCacheTime returns early, failing to call main.run(). Hence the error stays cached indefinitely as there is no TTL!
The following code fixes that problem
public class TestBench {
private static final Logger logger = LoggerFactory.getLogger(TestBench.class);
private static final Consumer<Throwable> onErrorDropped = e -> logger.error("Dropped", e);
static {
//add default hook
Hooks.onErrorDropped(onErrorDropped);
}
public static void main(String[] args) throws Exception {
var sampleService = new SampleService();
var producer = Mono.fromSupplier(sampleService::call).cache(
__ -> Duration.ofHours(24),
//don't cache errors
e -> {throw Exceptions.propagate(e);},
//meh
() -> {throw new RuntimeException();});
try {
producer.block();
} catch (RuntimeException e) {
System.out.println("Caught exception : " + e);
}
sampleService.serverAvailable = true;
var result = producer.block();
System.out.println(result);
}
static final class SampleService {
volatile boolean serverAvailable = false;
String call() {
System.out.println("Calling service with availability: " + serverAvailable);
if (!serverAvailable) throw new RuntimeException("Error");
return "Success";
}
}
}
But this adds a global Hook, which isn't ideal. The code hints at the ability to add per-pipeline hooks, but I cannot figure out how to do that. The following works, but is obviously a hack
.subscriberContext(ctx -> ctx.put("reactor.onErrorDropped.local", onErrorDropped))
Questions
Is the above a bug, should the absence of a onErrorDropped hook cause errors to be cached indefinitely?
Is there a way to set the onErrorDropped hook in the subscriberContext rather than globally?
Follow up
From the code; it seems that returning null from a TTL generator function is supported and has the same behaviour when the signal is immediately cleared. In the case where it isn't, the subscriber sees the original error rather than the error from the TTL generator and a suppressed error - which seems perhaps neater
public static void main(String[] args) throws Exception {
var sampleService = new SampleService();
var producer = Mono.fromSupplier(sampleService::call).cache(
__ -> Duration.ofHours(24),
//don't cache errors
e -> null,
//meh
() -> null);
try {
producer.block();
} catch (RuntimeException e) {
System.out.println("Caught exception : " + e);
}
sampleService.serverAvailable = true;
var result = producer.block();
System.out.println(result);
}
Is this behaviour supported? Should it be documented?
You've indeed found a bug! And I think the documentation can also be improved for this variant of cache:
The focus on how it deals with exceptions inside TTL Function is probably misleading
There should be a documented straightforward way of "ignoring" a category of signals in the source (which is you case: you want subsequent subscribers to "retry" when the source is erroring).
The behavior is bugged due to the use of onErrorDropped (which defaults to throwing the dropped exception, thus preventing the main.run() state reset).
Unfortunately, the tests use StepVerifier#verifyThenAssertThat(), which set an onErrorDropped hook, so that last bug was never identified.
Returning null in the TTL function is not working better because the same bug happens, but this time with the original source exception being dropped/bubbled.
But there is an ideal semantic for propagating an error to the first subscriber and let the second subscriber retry: to return Duration.ZERO in the ttl Function. This is undocumented, but works right now:
IllegalStateException exception = new IllegalStateException("boom");
AtomicInteger count = new AtomicInteger();
Mono<Integer> source = Mono.fromCallable(() -> {
int c = count.incrementAndGet();
if (c == 1) throw exception;
return c;
});
Mono<Integer> cache = source.cache(v -> Duration.ofSeconds(10),
e -> Duration.ZERO,
() -> Duration.ofSeconds(10));
assertThat(cache.retry().block()).isEqualTo(2);
I'll open an issue to fix the state reset bug and focus the javadoc on the above solution, while moving the bit dealing with throwing TTL Functions in a separate shorter paragraph at the end.
edit: https://github.com/reactor/reactor-core/issues/1783

Resources