I am new to Mutiny but have familiarity with Project Reactor.
I am looking for the counterpart to Mono.fromCallable() in Mutiny to create a Uni from a Callable or anonymous lambda?
You can create a Uni from a lambda using:
Uni.createFrom().item(() -> "Testing Uni");
For java.util.concurrent.Callable:
Uni.createFrom().future( new FutureTask<>( callable ));
Other ways you can create a Uni:
Uni.createFrom().item(1); // Emit 1 at subscription time
Uni.createFrom().item(() -> x); // Emit x at subscription time, the supplier is invoked for each subscription
Uni.createFrom().completionStage(cs); // Emit the item from this completion stage
Uni.createFrom().completionStage(() -> cs); // Emit the item from this completion stage, the stage is not created before subscription
Uni.createFrom().failure(exception); // Emit the failure at subscription time
Uni.createFrom().deferred(() -> Uni.from().value(x)); // Defer the uni creation until subscription. Each subscription can produce a different uni
Uni.createFrom().nullItem(); // Emit null at subscription time
Uni.createFrom().voidItem(); // Create a Uni not emitting any signal (returns Uni<Void>()
Uni.createFrom().nothing(); // Create a Uni not emitting any signal
Uni.createFrom().publisher(publisher); // Create a Uni from a Reactive Streams Publisher
Related
I have some Reactor Kafka code that reads in events via a KafkaReceiver and writes 1..many downstream messages via 1 or more KafkaSenders that are concatenated into a single Publisher. Everything is working great, but what I'd like to do is only emit an event from this concatenated senders Flux when it is complete (i.e. it's done writing to all downstream topics for any given event, so it does not emit anything for each element as it writes to Kafka downstream until it's done). This way I could sample() and periodically commit offsets, knowing that whenever it is that sample() happens to trigger and I commit offsets for an incoming event that I've processed all downstream messages for each event I'm committing offsets for. It seems like I could use either pauseUntilOther() or then() somehow, but I don't quite see exactly how given my code and specific use case. Any thoughts or suggestions appreciated, thanks.
Main Publisher code:
this.kafkaReceiver.receive()
.groupBy(m -> m.receiverOffset().topicPartition())
.flatMap(partitionFlux ->
partitionFlux.publishOn(this.scheduler)
.flatMap(this::processEvent)
.sample(Duration.ofMillis(10))
.concatMap(sr -> commitReceiverOffset(sr.correlationMetadata())))
.subscribe();
Concatenated KafkaSenders returned by call to processEvent():
return Flux.concat(kafkaSenderFluxA, kafkaSenderFluxB)
.doOnComplete(LOG.info(“Finished writing all downstream messages for incoming event);
Sounds like Flux.last() is what you are looking for:
return Flux.concat(kafkaSenderFluxA, kafkaSenderFluxB)
.doOnComplete(LOG.info(“Finished writing all downstream messages for incoming event)
.last();
Then your .sample(Duration.ofMillis(10)) would do whatever is available as a last item from one or several batches sent to those brokers. And in the end your commitReceiverOffset() would properly commit whatever was the last.
See its JavaDocs for more info:
/**
* Emit the last element observed before complete signal as a {#link Mono}, or emit
* {#link NoSuchElementException} error if the source was empty.
* For a passive version use {#link #takeLast(int)}
*
* <p>
* <img class="marble" src="doc-files/marbles/last.svg" alt="">
*
* <p><strong>Discard Support:</strong> This operator discards elements before the last.
*
* #return a {#link Mono} with the last value in this {#link Flux}
*/
public final Mono<T> last() {
and marble diagram: https://projectreactor.io/docs/core/release/api/reactor/core/publisher/doc-files/marbles/last.svg
For some time have been using create an EmitterProcessor with built in sink as follows:
EmitterProcessor<String> emitter = EmitterProcessor.create();
FluxSink<String> sink = emitter.sink(FluxSink.OverflowStrategy.LATEST);
The sink publishes using a Flux .from command
Flux<String> out = Flux
.from(emitter
.log(log.getName()));
and the sink can be passed around, and populated with strings, simply using the next instruction.
Now we see that EmitterProcessor is deprecated.
It's all replaced with Sinks.many() like this
Many<String> sink = Sinks.many().unicast().onBackpressureBuffer();
but how to use that to publish from?
The answer was casting the Sinks.many() to asFlux()
Flux<String> out = Flux
.from(sink.asFlux()
.log(log.getName()));
Also using that for cancel and termination of the flux
sink.asFlux().doOnCancel(() -> {
cancelSink(id, request);
});
/* Handle errors, eviction, expiration */
sink.asFlux().doOnTerminate(() -> {
disposeSink(id);
});
UPDATE The cancel and terminate don't appear to work per this question
I have a Pub/Sub topic + subscription and want to consume and aggregate the unbounded data from the subscription in a Dataflow. I use a fixed window and write the aggregates to BigQuery.
Reading and writing (without windowing and aggregation) works fine. But when I pipe the data into a fixed window (to count the elements in each window) the window is never triggered. And thus the aggregates are not written.
Here is my word publisher (it uses kinglear.txt from the examples as input file):
public static class AddCurrentTimestampFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
c.outputWithTimestamp(c.element(), new Instant(System.currentTimeMillis()));
}
}
public static class ExtractWordsFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
String[] words = c.element().split("[^a-zA-Z']+");
for (String word:words){ if(!word.isEmpty()){ c.output(word); }}
}
}
// main:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
p.apply("ReadLines", TextIO.Read.from(o.getInputFile()))
.apply("Lines2Words", ParDo.of(new ExtractWordsFn()))
.apply("AddTimestampFn", ParDo.of(new AddCurrentTimestampFn()))
.apply("WriteTopic", PubsubIO.Write.topic(o.getTopic()));
p.run();
Here is my windowed word counter:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
BigQueryIO.Write.Bound tablePipe = BigQueryIO.Write.to(o.getTable(o))
.withSchema(o.getSchema())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND);
Window.Bound<String> w = Window
.<String>into(FixedWindows.of(Duration.standardSeconds(1)));
p.apply("ReadTopic", PubsubIO.Read.subscription(o.getSubscription()))
.apply("FixedWindow", w)
.apply("CountWords", Count.<String>perElement())
.apply("CreateRows", ParDo.of(new WordCountToRowFn()))
.apply("WriteRows", tablePipe);
p.run();
The above subscriber will not work, since the window does not seem to trigger using the default trigger. However, if I manually define a trigger the code works and the counts are written to BigQuery.
Window.Bound<String> w = Window.<String>into(FixedWindows.of(Duration.standardSeconds(1)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes();
I like to avoid specifying custom triggers if possible.
Questions:
Why does my solution not work with Dataflow's default trigger?
How do I have to change my publisher or subscriber to trigger windows using the default trigger?
How are you determining the trigger never fires?
Your PubSubIO.Write and PubSubIO.Read transforms should both specify a timestamp label using withTimestampLabel, otherwise the timestamps you've added will not be written to PubSub and the publish times will be used.
Either way, the input watermark of the pipeline will be derived from the timestamps of the elements waiting in PubSub. Once all inputs have been processed, it will stay back for a few minutes (in case there was a delay in the publisher) before advancing to real time.
What you are likely seeing is that all the elements are published in the same ~1 second window (since the input file is pretty small). These are all read and processed relatively quickly, but the 1-second window they are put in will not trigger until after the input watermark has advanced, indicating that all data in that 1-second window has been consumed.
This won't happen until several minutes, which may make it look like the trigger isn't working. The trigger you wrote fired after 1 second of processing time, which would fire much earlier, but there is no guarantee all the data has been processed.
Steps to get better behavior from the default trigger:
Use withTimestampLabel on both the write and read pubsub steps.
Have the publisher spread the timestamps out further (eg., run for several minutes and spread the timestamps out across that range)
I am doing F# programming, I have some special requirements.
I have 3 class instances; each class instance has to run for one hour every day, from 9:00AM to 10:00AM. I want to control them from main program, starting them at the same time, and stop them also at the same time. The following is my code to start them at the same time, but I don’t know how to stop them at the same time.
#light
module Program
open ClassA
open ClassB
open ClassC
let A = new CalssA.A("A")
let B = new ClassB.B("B")
let C = new ClassC.C("C")
let task = [ async { return A.jobA("A")};
async { return B.jobB("B")};
async { return C.jobC("C")} ]
task |> Async.Parallel |> Async.RunSynchronously |> ignore
Anyone knows hows to stop all 3 class instances at 10:00AM, please show me your code.
Someone told me that I can use async with cancellation tokens, but since I am calling instance of classes in different modules, it is difficult for me to find suitable code samples.
Thanks,
The jobs themselves need to be stoppable, either by having a Stop() API of some sort, or cooperatively being cancellable via CancellationTokens or whatnot, unless you're just talking about some job that spins in a loop and you'll just thread-abort it eventually? Need more info about what "stop" means in this context.
As Brian said, the jobs themselves need to support cancellation. The programming model for cancellation that works the best with F# is based on CancellationToken, because F# keeps CancellationToken automatically in asynchronous workflows.
To implement the cancellation, your JobA methods will need to take additional argument:
type A() =
member x.Foo(str, cancellationToken:CancellationToken) =
for i in 0 .. 10 do
cancellationToken.ThrowIfCancellationRequested()
someOtherWork()
The idea is that you call ThrowIfCancellationRequested frequently during the execution of your job. If a cancellation is requested, the method thorws and the operation will stop. Once you do this, you can write asynchronous workflow that gets the current CancellationToken and passes it to JobA member when calling it:
let task =
[ async { let! tok = Async.CancellationToken
return A.JobA("A", tok) };
async { let! tok = Async.CancellationToken
return B.JobB("B") }; ]
Now you can create a new token using CancellationTokenSource and start the workflow. When you then cancel the token source, it will automatically stop any jobs running as part of the workflow:
let src = new CancellationTokenSource()
Async.Start(task, cancellationToken = src.Token)
// To cancel the job:
src.Cancel()
You asked this question on hubfs.net, and I'll repeat here my answer: try using Quartz.NET. You'd just implement IInteruptableJob in A,B,C, defining how they stop. Then another job at 10:00AM to stop the others.
Quartz.NET has a nice tutorial, FAQ, and lots of examples. It's pretty easy to use for simple cases like this, yet very powerful if you ever need more complex scheduling, monitoring jobs, logging, etc.
I'm trying to model a asynchronous job processing framework using MailboxProcessor. My requirements are to Start, Stop, Pause, and Resume the job processor. Can I build Pause / Resume functionality with MailboxProcessor? Also I should be able to Stop and Start? I'm trying to model after Windows Service.
I've a system in C#, implemented using Queue / Threads. I was looking for design alternatives, that's when I saw MailboxProcessor. I believe I can use it but couldnt figure out how to handle the above scenarios. So is it possible to achieve this functionality?
Sure :) Just hold an internal queue of jobs and enumerate through the queue when the job processor is in Start mode. In any other mode, just enqueue new jobs until the processor goes into start mode.
type 'a msg = // '
| Start
| Stop
| Pause
| Job of (unit -> unit)
type processQueue() =
let mb = MailboxProcessor.Start(fun inbox ->
let rec loop state (jobs : System.Collections.Generic.Queue<_>) =
async {
if state = Start then
while jobs.Count > 0 do
let f = jobs.Dequeue()
f()
let! msg = inbox.Receive()
match msg with
| Start -> return! loop Start jobs
| Pause -> return! loop Pause jobs
| Job(f) -> jobs.Enqueue(f); return! loop state jobs
| Stop -> return ()
}
loop Start (new System.Collections.Generic.Queue<_>()))
member this.Resume() = mb.Post(Start)
member this.Stop() = mb.Post(Stop)
member this.Pause() = mb.Post(Pause)
member this.QueueJob(f) = mb.Post(Job f)
This class behaves as expected: You can enqueue jobs in the Pause state, but they'll only run in the Start state. Once the processQueue is stopped, it can't be restarted, and none of the enqueued jobs will run (its easy enough to change this behavior so that, rather than killing the queue, it just doesn't enqueue a job in the Stop state).
Use MailboxProcessor.PostAndReply if you need two-way communication between the mailbox processor and your code.
You may want to check out Luca's blog, as I think it has some recent relevant stuff.