How can I make a Flux emit an extra element if no element has been emitted for a given time? - project-reactor

I am implementing a Heartbeat for a WebFlux SSE endpoint. To avoid a timeout in the client, I want to make sure that an element is emitted at least every, say, 10 seconds.
I came up with the following solution that emits a heartbeat element every 10 seconds regardless of whether a real element has been emitted or not:
originalFlux.mergeWith(Flux.interval(Duration.ofSeconds(10), Duration.ofSeconds(10)).map(ignored -> "heartbeat")
This is probably good enough for my use case but still I wonder if it is possible to emit the heartbeat only if no real element has been emitted in the last 10 seconds. I played around with the timeout operator which implements exactly the timing behavior I am looking for, but that emits an error and cancels the originalFlux instead of just emitting an extra element.
The following code using timeout passes my test but looks too complicated and as far as I understand could lose elements from the originalFlux if they are emitted between cancelling and re-subscribing to it:
ConnectableFlux<String> sharedOriginalFlux = originalFlux.publish();
CompletableFuture<Disposable> eventualSubscription = new CompletableFuture<>();
return addHeartbeat(sharedOriginalFlux)
.doOnSubscribe(ignored -> eventualSubscription.complete(sharedOriginalFlux.connect()))
.doFinally(ignored -> eventualSubscription.thenAccept(Disposable::dispose))
private Flux<String> addHeartbeat(Flux<String> sharedOriginalFlux) {
return sharedOriginalFlux.timeout(
Duration.ofSeconds(10),
Flux.mergeSequential(
Mono.just("heartbeat"),
Flux.defer(() -> addHeartbeat(sharedOriginalFlux))));
}
Is there a simple and safe way to do this?

It's not necessarily simpler, but another option could be to create a separate processor that can wrap the original Flux to provide a heartbeat (which shouldn't miss any elements):
public class HeartbeatProcessor<T> {
private final FluxProcessor<T, T> processor;
private final FluxSink<T> sink;
private final T heartbeatValue;
private final Duration heartbeatPeriod;
private Disposable d;
public HeartbeatProcessor(Flux<T> orig, T heartbeatValue, Duration heartbeatPeriod) {
this.heartbeatValue = heartbeatValue;
this.heartbeatPeriod = heartbeatPeriod;
this.processor = DirectProcessor.<T>create().serialize();
this.sink = processor.sink();
this.d = Mono.just(heartbeatValue).delayElement(heartbeatPeriod).subscribe(this::emit);
orig.subscribe(this::emit);
}
private void emit(T val) {
sink.next(val);
d.dispose();
this.d = Mono.just(heartbeatValue).delayElement(heartbeatPeriod).subscribe(this::emit);
}
public Flux<T> getFlux() {
return processor;
}
}
You could then call it as follows:
new HeartbeatProcessor<>(elements, "heartbeat", Duration.ofSeconds(10))
.getFlux()
.subscribe(System.out::println);

Flux.switchMap is a good candidate for this job: It switches to a new Publisher (and cancels the previous one) whenever the original Flux emits an item. In your case, the new Publisher is your heartbeat Flux.interval, prepended with the original item T:
public static Flux<String> addHeartbeat(Flux<String> originalFlux) {
return originalFlux
.startWith("heartbeat")
.materialize()
.switchMap(signal -> switch (signal.getType()) {
case ON_NEXT -> Flux.interval(Duration.ofSeconds(10))
.map(ignored -> "heartbeat")
.startWith(signal.get());
case ON_COMPLETE -> Mono.empty();
case ON_ERROR -> Mono.error(signal.getThrowable());
default -> Mono.error(new IllegalStateException());
});
}
Flux.switchMap is almost fit for the job, but it differs on two points from your desired solution:
It will only emit elements once the first element is received.
This means you have no heartbeat before the first item. This is solved by adding Flux.startWith("heartbeat"), which will emit "heartbeat" immediately on subscription, which then is processed by the switchMap into a heartbeat every 10 seconds.
The Publisher of the last element is never cancelled.
Since every generated Publisher is a Flux.interval that never completes, the onComplete signal will never reach the user. This is solved by transforming the onComplete signal into an emitted Signal item using Flux.materialize(), then map the onComplete Signal into an empty Publisher just to cancel the previous Publisher. This also creates onNext and onError Signals, which we have to handle each:
a. Signal.ON_NEXT can be processed as usual, retrieving the original with Signal.get()
b. Signal.ON_COMPLETE is mapped to an empty Mono that immediately completes.
c. Signal.ON_ERROR should relay the error downstream using Mono.error(Throwable).
d. The Signal enum contains more values, but they are not produced by Flux.materialize().
Here is the test to test this solution:
#Test
public void shouldAddHeartbeat() {
Flux<String> originalFlux = Flux.just(25, 15, 7, 5)
.concatMap(delay -> Mono.delay(Duration.ofSeconds(delay)).thenReturn(delay + " seconds delay"));
Flux<String> withHeartbeat = addHeartbeat(originalFlux);
StepVerifier.withVirtualTime(() -> withHeartbeat)
.expectNext("heartbeat")
.thenAwait(Duration.ofSeconds(10)).expectNext("heartbeat")
.thenAwait(Duration.ofSeconds(10)).expectNext("heartbeat")
.thenAwait(Duration.ofSeconds(5)).expectNext("25 seconds delay")
.thenAwait(Duration.ofSeconds(10)).expectNext("heartbeat")
.thenAwait(Duration.ofSeconds(5)).expectNext("15 seconds delay")
.thenAwait(Duration.ofSeconds(7)).expectNext("7 seconds delay")
.thenAwait(Duration.ofSeconds(5)).expectNext("5 seconds delay")
.verifyComplete();
}

Related

How to handle streams and transactions with java-grpc

I'm having an issue with gRPC streaming in Java but I believe this problem also exists in other "reactive" contexts.
If you have a method like this (Kotlin):
#Transactional
override fun get(
request: DummyRequest,
responseObserver: StreamObserver<DummyResponse>,
) {
val counter = AtomicInteger()
myStreamingDatabaseQuery().use {
it.iterator().forEach { responseObserver.onNext(it) }
}
responseObserver.onCompleted()
}
fun myStreamingDatabaseQuery(): Stream<Dummy> = ...
It will work fine as the transaction is opened, the stream is processed and closed, and then thetransaction is closed.
However, you might then run into memory issues and try to use some sort of flow control like this:
#Transactional
override fun get(
request: DummyRequest,
responseObserver: StreamObserver<DummyResponse>,
) {
val counter = AtomicInteger()
val iterator = myStreamingDatabaseQuery().iterator()
StreamObservers.copyWithFlowControl(
iterator,
responseObserver as CallStreamObserver<DummyResponse>,
)
}
fun myStreamingDatabaseQuery(): Stream<Dummy> = ...
This won't work because StreamObservers just sets an onReadyHandler and immediately returns. The stream will then be processed in this handler, after get() has returned - and therefore the transaction will have been closed and it can no longer read the stream.
How is this commonly solved? And how would I do it with grpc-java/Spring?

Sharing Mono with the publish method doesn't work as expected

I have two service calls. The second one accepts a value that the first returns. I need to return the result of the first call only if the second succeeds. The following is my prototype implementation, however, the resulting mono is always empty. Please explain why it doesn't work and how to implement it the proper way.
#Test
public void testPublish() {
callToService1().publish(
mono -> mono.flatMap(resultOfCall1 -> callToService2(resultOfCall1))
.then(mono)
)
.map(Integer::valueOf)
.as(StepVerifier::create)
.expectNext(1)
.verifyComplete();
}
Mono<String> callToService1() {
return Mono.just("1");
}
Mono<Integer> callToService2(String value) {
// parameter that used in a call to service2
return Mono.empty();
}
Not sure why you used publish(Function). Sounds like your requirement would be fulfilled by a simple direct flatMap:
callToService1()
.flatMap(v1 -> callToService2(v1)
.thenReturn(v1)
);
if callToService2 throws or produces an onError, that error will be propagated to the main sequence, terminating it.
(edited below for requirement of emitting value from service1)
otherwise, inside the flatMap the callToService2 is completed then we ignore the result and emit the still in scope v1 value thanks to thenReturn (which also propagates onError if callToService2 emits onError)

Project Reactor + flatMap + Multiple onErrorComplete - Not working as expected

When multiple onErrorContinue added to the pipeline to handle specific type of exception thrown from flatMap, the exception handling is not working as expected.
The below code, I expect, the elements 1 to 6 should be dropped and element 7 to 10 should be consumed by the subscriber.
public class FlatMapOnErrorContinueExample {
public static void main(String[] args) {
Flux.just(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
.flatMap(number -> {
if (number <= 3) {
return Mono.error(new NumberLesserThanThree("Number is lesser than 3"));
} else if (number > 3 && number <= 6) {
return Mono.error(new NumberLesserThanSixButGretherThan3("Number is grether than 6"));
} else {
return Mono.just(number);
}
})
.onErrorContinue(NumberLesserThanThree.class,
(throwable, object) -> System.err.println("Exception: Dropping the element because it is lesser than 3"))
.onErrorContinue(NumberLesserThanSixButGretherThan3.class,
(throwable, object) -> System.err.println("Exception: Dropping the element because it is lesser than 6 but grether than 3"))
.onErrorContinue((throwable, object) ->
System.err.println("Exception: " + throwable.getMessage()))
.subscribe(number -> System.out.println("number is " + number),
error -> System.err.println("Exception in Subscription " + error.getMessage()));
}
public static class NumberLesserThanThree extends RuntimeException {
public NumberLesserThanThree(final String msg) {
super(msg);
}
}
public static class NumberLesserThanSixButGretherThan3 extends RuntimeException {
public NumberLesserThanSixButGretherThan3(final String msg) {
super(msg);
}
}
}
Here is the output what I am getting:
Exception: Dropping the element because it is lesser than 3
Exception: Dropping the element because it is lesser than 3
Exception: Dropping the element because it is lesser than 3
Exception in Subscription Number is grether than 6
Question: Why the 2nd onErrorContinue is not called but the exception send to subscriber?
Additional Note:
if i remove 1st and 2nd onErrorContinue, then all exception are handled by 3rd onErrorContinue. I could use this approach to receive all exception and check for the type of exception and proceed with handling. However, I would like to make it cleaner exception handling rather than adding if..else block.
How this question is different from Why does Thread.sleep() trigger the subscription to Flux.interval()?
1) This question about exception handling and the order of exception handling; The other question is about processing elements in parallel and making the main thread waiting for the all the element processing complete
3) This question dont have any concern about threading, even if add Thread.sleep(10000) after . subscribe, there is no change in behaviour.
This again comes down to the unusual behaviour of onErrorContinue. It breaks the rule in that it doesn't "catch" errors and then change the behaviour downstream as a result, it actually allows supporting operators to "look ahead" at it and behave accordingly, thus changing the result upstream.
This is weird, and leads to some behaviour that's not immediately obvious, such as is the case here. As far as I'm aware, all supporting operators only look ahead to the next onErrorContinue operator, rather than recursively searching ahead for all such operators. Instead, they will evaluate the predicate of the next onErrorContinue (in this case whether it's of a certain type), and then behave accordingly - either invoking the handler if the predicate returns true, or throwing the error downstream if not. (There's no case where it will then move onto the next onErrorContinue operator, then the next, until a predicate is matched.)
Clearly this is a contrived example - but because of these idiosyncrasies, I'd almost always recommend avoiding onErrorContinue. There's two "normal" ways that can happen where flatMap() is involved:
If flatMap() has an "inner reactive chain" in it, that is it calls another method or series of methods that return a publisher - then just use onErrorResume() at the end of the flatMap() call to handle those errors. You can chain onErrorResume() since it works with downstream, not upstream operators. This is by far the most common case.
If flatMap() is an imperative collection of if / else that's returning different publishers such as it is here and you want to / have to keep the imperative style, throw exceptions instead of using Mono.error(), and catch as appropriate, returning Mono.empty() in case of an error:
.flatMap(number -> {
try {
if (number <= 3) {
throw new NumberLessThanThree();
} else if (number <= 6) {
throw new NumberLessThanSixButGreaterThan3();
} else {
return Mono.just(number);
}
}
catch(NumberLessThanThree ex) {
//Handle it
return Mono.empty();
}
catch(NumberLessThanSixButGreaterThan3 ex) {
//As above
}
})
In general, using one of these two approaches will make it much easier to reason about what's going on.
(For the sake of completeness after reading the comments - this isn't anything to do with the reactive chain being unable to complete before the main thread exits.)

Apache Beam Stateful DoFn Periodically Output All K/V Pairs

I'm trying to aggregate (per key) a streaming data source in Apache Beam (via Scio) using a stateful DoFn (using #ProcessElement with #StateId ValueState elements). I thought this would be most appropriate for the problem I'm trying to solve. The requirements are:
for a given key, records are aggregated (essentially summed) across all time - I don't care about previously computed aggregates, just the most recent
keys may be evicted from the state (state.clear()) based on certain conditions that I control
Every 5 minutes, regardless if any new keys were seen, all keys that haven't been evicted from the state should be outputted
Given that this is a streaming pipeline and will be running indefinitely, using a combinePerKey over a global window with accumulating fired panes seems like it will continue to increase its memory footprint and the amount of data it needs to run over time, so I'd like to avoid it. Additionally, when testing this out, (maybe as expected) it simply appends the newly computed aggregates to the output along with the historical input, rather than using the latest value for each key.
My thought was that using a StatefulDoFn would simply allow me to output all of the global state up until now(), but it seems this isn't a trivial solution. I've seen hintings at using timers to artificially execute callbacks for this, as well as potentially using a slowly growing side input map (How to solve Duplicate values exception when I create PCollectionView<Map<String,String>>) and somehow flushing this, but this would essentially require iterating over all values in the map rather than joining on it.
I feel like I might be overlooking something simple to get this working. I'm relatively new to many concepts of windowing and timers in Beam, looking for any advice on how to solve this. Thanks!
You are right that Stateful DoFn should help you here. This is a basic sketch of what you can do. Note that this only outputs the sum without the key. It may not be exactly what you want, but it should help you move forward.
class CombiningEmittingFn extends DoFn<KV<Integer, Integer>, Integer> {
#TimerId("emitter")
private final TimerSpec emitterSpec = TimerSpecs.timer(TimeDomain.PROCESSING_TIME);
#StateId("done")
private final StateSpec<ValueState<Boolean>> doneState = StateSpecs.value();
#StateId("agg")
private final StateSpec<CombiningState<Integer, int[], Integer>>
aggSpec = StateSpecs.combining(
Sum.ofIntegers().getAccumulatorCoder(null, VarIntCoder.of()), Sum.ofIntegers());
#ProcessElement
public void processElement(ProcessContext c,
#StateId("agg") CombiningState<Integer, int[], Integer> aggState,
#StateId("done") ValueState<Boolean> doneState,
#TimerId("emitter") Timer emitterTimer) throws Exception {
if (SOME CONDITION) {
countValueState.clear();
doneState.write(true);
} else {
countValueState.addAccum(c.element().getValue());
emitterTimer.align(Duration.standardMinutes(5)).setRelative();
}
}
}
#OnTimer("emitter")
public void onEmit(
OnTimerContext context,
#StateId("agg") CombiningState<Integer, int[], Integer> aggState,
#StateId("done") ValueState<Boolean> doneState,
#TimerId("emitter") Timer emitterTimer) {
Boolean isDone = doneState.read();
if (isDone != null && isDone) {
return;
} else {
context.output(aggState.getAccum());
// Set the timer to emit again
emitterTimer.align(Duration.standardMinutes(5)).setRelative();
}
}
}
}
Happy to iterate with you on something that'll work.
#Pablo was indeed correct that a StatefulDoFn and timers are useful in this scenario. Here is the with code I was able to get working.
Stateful Do Fn
// DomainState is a custom case class I'm using
type DoFnT = DoFn[KV[String, DomainState], KV[String, DomainState]]
class StatefulDoFn extends DoFnT {
#StateId("key")
private val keySpec = StateSpecs.value[String]()
#StateId("domainState")
private val domainStateSpec = StateSpecs.value[DomainState]()
#TimerId("loopingTimer")
private val loopingTimer: TimerSpec = TimerSpecs.timer(TimeDomain.EVENT_TIME)
#ProcessElement
def process(
context: DoFnT#ProcessContext,
#StateId("key") stateKey: ValueState[String],
#StateId("domainState") stateValue: ValueState[DomainState],
#TimerId("loopingTimer") loopingTimer: Timer): Unit = {
... logic to create key/value from potentially null values
if (keepState(value)) {
loopingTimer.align(Duration.standardMinutes(5)).setRelative()
stateKey.write(key)
stateValue.write(value)
if (flushState(value)) {
context.output(KV.of(key, value))
}
} else {
stateValue.clear()
}
}
#OnTimer("loopingTimer")
def onLoopingTimer(
context: DoFnT#OnTimerContext,
#StateId("key") stateKey: ValueState[String],
#StateId("domainState") stateValue: ValueState[DomainState],
#TimerId("loopingTimer") loopingTimer: Timer): Unit = {
... logic to create key/value checking for nulls
if (keepState(value)) {
loopingTimer.align(Duration.standardMinutes(5)).setRelative()
if (flushState(value)) {
context.output(KV.of(key, value))
}
}
}
}
With pipeline
sc
.pubsubSubscription(...)
.keyBy(...)
.withGlobalWindow()
.applyPerKeyDoFn(new StatefulDoFn())
.withFixedWindows(
duration = Duration.standardMinutes(5),
options = WindowOptions(
accumulationMode = DISCARDING_FIRED_PANES,
trigger = AfterWatermark.pastEndOfWindow(),
allowedLateness = Duration.ZERO,
// Only take the latest per key during a window
timestampCombiner = TimestampCombiner.END_OF_WINDOW
))
.reduceByKey(mostRecentEvent())
.saveAsCustomOutput(TextIO.write()...)

How chain indefinite amount of flatMap operators in Reactor?

I have some initial state in my application and a few of policies that decorates this state with reactively fetched data (each of policy's Mono returns new instance of state with additional data). Eventually I get fully decorated state.
It basically looks like this:
public interface Policy {
Mono<State> apply(State currentState);
}
Usage for fixed number of policies would look like that:
Flux.just(baseState)
.flatMap(firstPolicy::apply)
.flatMap(secondPolicy::apply)
...
.subscribe();
It basically means that entry state for a Mono is result of accumulation of initial state and each of that Mono predecessors.
For my case policies number is not fixed and it comes from another layer of the application as a collection of objects that implements Policy interface.
Is there any way to achieve similar result as in the given code (with 2 flatMap), but for unknown number of policies? I have tried with Flux's reduce method, but it works only if policy returns value, not a Mono.
This seems difficult because you're streaming your baseState, then trying to do an arbitrary number of flatMap() calls on that. There's nothing inherently wrong with using a loop to achieve this, but I like to avoid that unless absolutely necessary, as it breaks the natural reactive flow of the code.
If you instead iterate and reduce the policies into a single policy, then the flatMap() call becomes trivial:
Flux.fromIterable(policies)
.reduce((p1,p2) -> s -> p1.apply(s).flatMap(p2::apply))
.flatMap(p -> p.apply(baseState))
.subscribe();
If you're able to edit your Policy interface, I'd strongly suggest adding a static combine() method to reference in your reduce() call to make that more readable:
interface Policy {
Mono<State> apply(State currentState);
public static Policy combine(Policy p1, Policy p2) {
return s -> p1.apply(s).flatMap(p2::apply);
}
}
The Flux then becomes much more descriptive and less verbose:
Flux.fromIterable(policies)
.reduce(Policy::combine)
.flatMap(p -> p.apply(baseState))
.subscribe();
As a complete demonstration, swapping out your State for a String to keep it shorter:
interface Policy {
Mono<String> apply(String currentState);
public static Policy combine(Policy p1, Policy p2) {
return s -> p1.apply(s).flatMap(p2::apply);
}
}
public static void main(String[] args) {
List<Policy> policies = new ArrayList<>();
policies.add(x -> Mono.just("blah " + x));
policies.add(x -> Mono.just("foo " + x));
String baseState = "bar";
Flux.fromIterable(policies)
.reduce(Policy::combine)
.flatMap(p -> p.apply(baseState))
.subscribe(System.out::println); //Prints "foo blah bar"
}
If I understand the problem correctly, then the most simple solution is to use a regular for loop:
Flux<State> flux = Flux.just(baseState);
for (Policy policy : policies)
{
flux = flux.flatMap(policy::apply);
}
flux.subscribe();
Also, note that if you have just a single baseSate you can use Mono instead of Flux.
UPDATE:
If you are concerned about breaking the flow, you can extract the for loop into a method and apply it via transform operator:
Flux.just(baseState)
.transform(this::applyPolicies)
.subscribe();
private Publisher<State> applyPolicies(Flux<State> originalFlux)
{
Flux<State> newFlux = originalFlux;
for (Policy policy : policies)
{
newFlux = newFlux.flatMap(policy::apply);
}
return newFlux;
}

Resources