Reactor - Publish behavior - project-reactor

Reactor - Publish behavior - project-reactor

I am very confused with publish behavior in Flux. Why the second subscriber does not print anything but the first one does. It is a hot publisher and values are emitted once a second. both should share same elements.
Flux<String> flux = Flux.fromIterable(Arrays.asList("a", "b", "c", "d", "e", "f"))
.publish()
.autoConnect()
.delayElements(Duration.ofSeconds(1));
flux.subscribe(s -> System.out.println("1 - " + s));
flux.subscribe(s -> System.out.println("2 - " + s));
Interestingly share method shows the output for both subscribers.

The fact it's a "hot publisher" means that you may miss values if they're emitted before you subscribe, which is what's happening here - the first subscriber is causing the Flux to start publishing, and it publishes all its values before your second subscriber.
You may expect delayElements() to change this behaviour, as it's delaying each one by a second, which should be more than enough time for the second subscriber to subscribe. However, this doesn't happening as you're only delaying the elements after the publish() and autoconnect() calls - or to put it another way, only after the Flux has been made "hot". This means all the values are emitted near instantly, before being delayed, and before the second subscriber gets a chance to subscribe.
Instead, I suspect you want the delayElements() call before it's a hot Flux as follows:
Flux<String> flux = Flux.fromIterable(Arrays.asList("a", "b", "c", "d", "e", "f"))
.delayElements(Duration.ofSeconds(1))
.publish()
.autoConnect();
This will then delay the elements before the Flux becomes hot, almost certainly giving your second subscriber enough time to subscribe, and printing both sets of results as you expect.

Related

Flux.switchOnNext variant that switches when the next publisher _emits_ rather than when it _is emitted_

Reactor has the switchOnNext operator which mirrors a sequence of publishers, cancelling the previous subscription whenever a new publisher becomes available:
For my use case I need a variation on this theme, where instead of cancelling the first publisher before subscribing to the next, I continue to mirror the sequence of publisher 1 until the point when publisher 2 emits its first item, and only then make the switch, as in this marble diagram (for anyone who finds this question later this is not a diagram of an existing operator from the Reactor docs, it's one I've just sketched myself):
I appreciate that in the general case this could potentially involve the operator maintaining an unbounded number of subscriptions waiting for any one of them to emit before cancelling the others, but for my use case I know that the initial flux-of-fluxes is finite (so I don't necessarily need the fully general publisher-of-publishers solution, something that works for a finite list of N publishers would be sufficient).
Can anyone see a clever combination of the existing operators that would implement this behaviour or do I need to write it from first principles?

Interesting problem! I think something like this might work:
#Test
void switchOnNextEmit() {
Duration gracePeriod = Duration.ofSeconds(2);
Flux.concat(
Mono.just(sequence("a", 1)),
Mono.just(sequence("b", 3)).delaySubscription(Duration.ofSeconds(5)),
Mono.just(sequence("c", 10)).delaySubscription(Duration.ofSeconds(10)))
.map(seq -> seq.publish().refCount(1, gracePeriod))
.scan(
Tuples.of(Flux.<String>never(), Flux.<String>never()),
(acc, next) -> Tuples.of(acc.getT2().takeUntilOther(next), next))
.switchMap(t -> Flux.merge(t.getT1(), t.getT2()))
.doOnNext(it -> System.out.println("Result: " + it))
.then()
.block();
}
private static Flux<String> sequence(String name, int interval) {
return Flux.interval(Duration.ofSeconds(interval))
.map(i -> name + i)
.doOnSubscribe(__ -> System.out.println("Subscribe: " + name))
.doOnCancel(() -> System.out.println("Cancel: " + name));
}
The important part is that we convert the sequences into hot publishers (meaning that resubscribing them doesn't cause another subscription, rather than we share the initial subscription). Then we use scan to emit a tuple containing the previous and the next, and finally we just use a regular switchMap to observe both (note how the first will stop when the second emits, due to takeUntilOther).
Note that the grace period is important because switchMap will first cancel and then subscribe the next, so if there isn't any grace period it would cause the current hot publisher to fully stop and start from scratch, which is not what we want.

Combine framework third `collect` method

In the iOS 13 Combine framework, there are three collect operator methods. The first two are obvious but the third uses types I can't figure out.
collect(_:options:)
https://developer.apple.com/documentation/foundation/timer/timerpublisher/3329497-collect
func collect<S>(_ strategy: Publishers.TimeGroupingStrategy<S>,
options: S.SchedulerOptions? = nil)
-> Publishers.CollectByTime<Timer.TimerPublisher, S>
where S : Scheduler
Can anyone give an example of how one would call this method?

After some struggle, I came up with an example like this:
let t = Timer.publish(every: 0.4, on: .main, in: .default)
t
.scan(0) {prev,_ in prev+1}
.collect(.byTime(DispatchQueue.main, .seconds(1))) // *
.sink(receiveCompletion: {print($0)}) {print($0)}.store(in:&storage)
let cancellable = t.connect()
delay(3) { cancellable.cancel() }
(where storage is the usual Set<AnyCancellable> to keep the subscriber alive).
The output is:
[1, 2]
[3, 4, 5]
[6, 7]
So we are publishing a new number about every 0.4 seconds, but collect only does its thing every 1 second. Thus, the first two values arrive, publishing 1 and 2, and then collect does its thing, accumulates all the values that have arrived so far, and publishes them as an array, [1,2]. And so on. Every second, whatever has come down the pipeline so far is accumulated into an array and published as an array.

The two TimeGroupingStrategy mechanisms are published in that enum. As of iOS 13.3 there are still just two:
byTime
byTimeOrCount
In either case, the first two parameters are a scheduler upon which to run them (Immediate, DispatchQueue, Runloop, or OperationQueue), which is often just inferred by whatever you pass in. Along with the scheduler is a Stride - a time interval you specify - that the operator will buffer values over.
In the byTime, it will collect and buffer as many elements as it receives (using an unbounded amount of memory to do so) in the interval you specify. The byTimeOrCount will limit how many items get buffered to a specific count.
The two means of specifying these are:
let q = DispatchQueue(label: self.debugDescription)
publisher
.collect(.byTime(q, 1.0))
or
let q = DispatchQueue(label: self.debugDescription)
publisher
.collect(.byTimeOrCount(q, 1.0, 10))
These use a DispatchQueue, but you could just as easily use any of the other schedulers.
If you just pass in an Double for the stride, it takes that as a value in seconds.
In both cases, when the time (or count, if that version is specified) is elapsed, the operator will publish an array of the collected values to its subscribers in turn.

How do I queue work across multiple publishers in Reactor 3?

I'm creating a library for creating data processing workflows using Reactor 3. Each task will have an input flux and an output flux. The input flux is provided by the user. The output flux is created by the library. Tasks can be chained to form a DAG. Something like this: (It's in Kotlin)
val base64 = task<String, String>("base64") {
input { Flux.just("a", "b", "c", "d", "e") }
outputFn { ... get the output values ... }
scriptFn { ... do some stuff ... }
}
val step2 = task<List<String>, String>("step2") {
input { base64.output.buffer(3) }
outputFn { ... }
scriptFn { ... }
}
I have the requirement to limit concurrency for the whole workflow. Only a configured number of inputs can be processed at once. In the example above for a limit of 3 this would mean task base64 would run with inputs "a", "b", and "c" first, then wait for each to complete before processing "d", "e" and the "step2" tasks.
How can I apply such limitations when creating output fluxes from input fluxes? Could a TopicProcessor somehow be applied? Maybe some sort of custom scheduler or processor? How would back-pressure work? Do I need to worry about creating a buffer?

Backpressure propagates from the final susbriber up, across the whole chain. But operators in the chain can ask for data in advance (prefetch) or even "rewrite" the request. For example, in the case of buffer(3) if that operator receives a request(1) it will perform a request(3) upstream ("1 buffer == max 3 elements so I can request my source enough to fill the 1 buffer I was requested").
If the input is always provided by the user, this will be hard to abstract away...
There is no easy way to rate limit sources across multiple pipelines or even multiple subscriptions to a given pipeline (a Flux).
Using a shared Scheduler in multiple publishOn will not work because publishOn selects a Worker thread and sticks to it.
However, if your question is more specifically about the base64 task being limited, maybe the effect can be obtained from flatMap's concurrency parameter?
input.flatMap(someString -> asyncProcess(someString), 3, 1);
This will let at most 3 occurrences of asyncProcess run, and each time one terminates it starts a new one from the next value from input.

Is there any way to lock variable between different process in Erlang

There's variable in my module, and there's receive method to renew variable value. And multiple process are calling this method simultaneously. I need lock this variable when one process is modifying it. Sample as below
mytest.erl
%%%-------------------------------------------------------------------
-module(mytest).
%% API
-export([start_link/0,display/1,callDisplay/2]).
start_link()->
Pid=spawn(mytest,display,["Hello"]),
Pid.
display(Val) ->
io:format("It started: ~p",[Val]),
NextVal=
receive
{call,Msg}->
NewVal=Val++" "++Msg++" ",
NewVal;
stop->
true
end,
display(NextVal).
callDisplay(Pid,Val)->
Pid!{call,Val}.
Start it
Pid=mytest:start_link().
Two process are calling it in the same time
P1=spawn(mytest,callDisplay,[Pid,"Walter"]),
P2=spawn(mytest,callDisplay,[Pid,"Dave"]).
I hope it can add "Walter", "Dave" one by one like "Hello Walter Dave", however, when there're too many of them running together, some Names(Walter, Dave, etc) will be override.
Because when P1, P2 started the same time, Val both are "Hello". P1 add "Walter" to become "Hello Walter", P2 add "Dave" to become "Hello Dave". P1 saved it firstly to NextVal as "Hello Walter", then P2 saved it to NextVal as "Hello Dave", so result will be "Hello Dave". "Hello Walter" is replaced by "Hello Dave", and "Walter" lost forever.
Is there any way I can lock "Val", so when we add "Walter", "Dave" will waiting till Value setting is done?

Even though it's an old question but it's worth explaining.
From what you said and if I'm correct,
you expect to see
"Hello Walter", and "Hello Dave". However, you're seeing successive names been appended to the former as, "Hello Walter Dave.."
This behavior is normal and to see that let look briefly at Erlang memory model. Erlang process memory is divided into three main parts:
Process Control Block(PCB):
This hold the process pid, registered name,table,states and pointers to messages in the it's queue.
Stack:
This hold function parameters, local variables and function return address.
Private Heap: This hold incoming message compound data like tuple, list and binary(not larger than 64 bytes).
All data in these memory belong to and are private to the owning process.
Stage1:
When Pid=spawn(mytest,display,["Hello"]) is called, the server process is created, then the display function with "Hello" passed as argument is called. Since display/1 is executed in the serve process, the "Hello" argument lives in the server's process stack. Execution of display/1 continues until it reaches the receive clause then block and await message matching your format.
Stage 2:
Now P1 starts, it executes ServerPid ! {call, "Walter"}, then P2 executes ServerPid ! {call, "Dave"}. In both cases, erlang makes a copy of the message and send it to the server's process mailbox (Private Heap). This copied message in the mailbox belongs to the server process not the client's.
Now, when {call, "Walter"} is matched, Msg get bound to "Walter".
From stage1, we know Val is bounded to "Hello", Newval then get bounded to "Val ++ " " ++ Msg" = "Hello Walter".
At this point, P2's message, {call, "Dave"}, is still in the server's mailbox awaiting the next receive clause which will happen in the next recursive call to display/1. NextVal get bound to NewVal and the recursive call to dispaly/1 with "Hello Walter" passed as argument is made. This gives the first print "Hello Walter " which now also lives in the server's process stack.
Now when the receive clause is reach again, P2's message {call, "Dave"} is matched.
Now NewVal and NextVal get bound to "Hello Walter" ++ " " ++ "Dave" = "Hello Walter Dave". This get passed as argument to display/1 as the new Val to print Hello Walter Dave. In a nutshell, this variable is updated on every server loop. It serves the same purpose as the State term in gen_server behavior. In your case, successive client calls just appends the message to this serve state variable. Now to your question,
Is there any way I can lock Val, so when we add "Walter", "Dave" will waiting till Value setting is done?
No. Not by locking. Erlang does not work this way.
There are no process locking constructs as it does not need one.
Data(Variables) are always immutable and private(except large binaries which stays in the Shared Heap) to the process that created it.
Also, it's not the actual message you used in the Pid ! Msg construct that is process by the receiving process. It's it copy. The Val parameter in yourdisplay/1 function is private and belongs to the server process because it lives in it stack memory as every call to display/1 is made by the server process itself. So there is no way any other process can lock not even see that variable.
Yes. By sequential message processing
This is exactly what the server process is doing. Polling one message a time from it queue. When {call, "Walter"} was taken, {call, "Dave"} was waiting in the queue. The reason why you see unexpected greeting is because the you change the server state, the display/1 parameter for the next display/1 call which process {call, "Dave"}

Why does return/redo evaluate result functions in the calling context, but block results are not evaluated?

Last night I learned about the /redo option for when you return from a function. It lets you return another function, which is then invoked at the calling site and reinvokes the evaluator from the same position
>> foo: func [a] [(print a) (return/redo (func [b] [print b + 10]))]
>> foo "Hello" 10
Hello
20
Even though foo is a function that only takes one argument, it now acts like a function that took two arguments. Something like that would otherwise require the caller to know you were returning a function, and that caller would have to manually use the do evaluator on it.
Thus without return/redo, you'd get:
>> foo: func [a] [(print a) (return (func [b] [print b + 10]))]
>> foo "Hello" 10
Hello
== 10
foo consumed its one parameter and returned a function by value (which was not invoked, thus the interpreter moved on). Then the expression evaluated to 10. If return/redo did not exist you'd have had to write:
>> do foo "Hello" 10
Hello
20
This keeps the caller from having to know (or care) if you've chosen to return a function to execute. And is cool because you can do things like tail call optimization, or writing a wrapper for the return functionality itself. Here's a variant of return that prints a message but still exits the function and provides the result:
>> myreturn: func [] [(print "Leaving...") (return/redo :return)]
>> foo: func [num] [myreturn num + 10]
>> foo 10
Leaving...
== 20
But functions aren't the only thing that have behavior in do. So if this is a general pattern for "removing the need for a DO at the callsite", then why doesn't this print anything?
>> test: func [] [return/redo [print "test"]]
>> test
== [print "test"]
It just returned the block by value, like a normal return would have. Shouldn't it have printed out "test"? That's what do would...uh, do with it:
>> do [print "test"]
test

The short answer is because it is generally unnecessary to evaluate a block at the call point, because blocks in Rebol don't take parameters so it mostly doesn't matter where they are evaluated. However, that "mostly" may need some explanation...
It comes down to two interesting features of Rebol: static binding, and how do of a function works.
Static Binding and Scopes
Rebol doesn't have scoped word bindings, it has static direct word bindings. Sometimes it seems like we have lexical scope, but we really fake that by updating the static bindings each time we're building a new "scoped" code block. We can also rebind words manually whenever we want.
What that means for us in this case though, is that once a block exists, its bindings and values are static - they're not affected by where the block is physically located, or where it is being evaluated.
However, and this is where it gets tricky, function contexts are weird. While the bindings of words bound to a function context are static, the set of values assigned to those words are dynamically scoped. It's a side effect of how code is evaluated in Rebol: What are language statements in other languages are functions in Rebol, so a call to if, for instance, actually passes a block of data to the if function which if then passes to do. That means that while a function is running, do has to look up the values of its words from the call frame of the most recent call to the function that hasn't returned yet.
This does mean that if you call a function and return a block of code with words bound to its context, evaluating that block will fail after the function returns. However, if your function calls itself and that call returns a block of code with its words bound to it, evaluating that block before your function returns will make it look up those words in the call frame of the current call of your function.
This is the same for whether you do or return/redo, and affects inner functions as well. Let me demonstrate:
Function returning code that is evaluated after the function returns, referencing a function word:
>> a: 10 do do has [a] [a: 20 [a]]
** Script error: a word is not bound to a context
** Where: do
** Near: do do has [a] [a: 20 [a]]
Same, but with return/redo and the code in a function:
>> a: 10 do has [a] [a: 20 return/redo does [a]]
** Script error: a word is not bound to a context
** Where: function!
** Near: [a: 20 return/redo does [a]]
Code do version, but inside an outer call to the same function:
>> do f: function [x] [a: 10 either zero? x [do f 1] [a: 20 [a]]] 0
== 10
Same, but with return/redo and the code in a function:
>> do f: function [x] [a: 10 either zero? x [f 1] [a: 20 return/redo does [a]]] 0
== 10
So in short, with blocks there is usually no advantage to doing the block elsewhere than where it is defined, and if you want to it is easier to use another call to do instead. Self-calling recursive functions that need to return code to be executed in outer calls of the same function are an exceedingly rare code pattern that I have never seen used in Rebol code at all.
It could be possible to change return/redo so it would handle blocks as well, but it probably isn't worth the increased overhead to return/redo to add a feature that is only useful in rare circumstances and already has a better way to do it.
However, that brings up an interesting point: If you don't need return/redo for blocks because do does the same job, doesn't the same apply to functions? Why do we need return/redo at all?
How DO of a Function Works
Basically, we have return/redo because it uses exactly the same code that we use to implement do of a function. You might not realize it, but do of a function is really unusual.
In most programming languages that can call a function value, you have to pass the parameters to the function as a complete set, sort of how R3's apply function works. Regular Rebol function calling causes some unknown-ahead-of-time number of additional evaluations to happen for its arguments using unknown-ahead-of-time evaluation rules. The evaluator figures out these evaluation rules at runtime and just passes the results of the evaluation to the function. The function itself doesn't handle the evaluation of its parameters, or even necessarily know how those parameters were evaluated.
However, when you do a function value explicitly, that means passing the function value to a call to another function, a regular function named do, and then that magically causes the evaluation of additional parameters that weren't even passed to the do function at all.
Well it's not magic, it's return/redo. The way do of a function works is that it returns a reference to the function in a regular shortcut-return value, with a flag in the shortcut-return value that tells the interpreter that called do to evaluate the returned function as if it were called right there in the code. This is basically what is called a trampoline.
Here's where we get to another interesting feature of Rebol: The ability to shortcut-return values from a function is built into the evaluator, but it doesn't actually use the return function to do it. All of the functions you see from Rebol code are wrappers around the internal stuff, even return and do. The return function we call just generates one of those shortcut-return values and returns it; the evaluator does the rest.
So in this case, what really happened is that all along we had code that did what return/redo does internally, but Carl decided to add an option to our return function to set that flag, even though the internal code doesn't need return to do so because the internal code calls the internal function. And then he didn't tell anyone that he was making the option externally available, or why, or what it did (I guess you can't mention everything; who has the time?). I have the suspicion, based on conversations with Carl and some bugs we've been fixing, that R2 handled do of a function differently, in a way that would have made return/redo impossible.
That does mean that the handling of return/redo is pretty thoroughly oriented towards function evaluation, since that is its entire reason for existing at all. Adding any overhead to it would add overhead to do of a function, and we use that a lot. Probably not worth extending it to blocks, given how little we'd gain and how rarely we'd get any benefit at all.
For return/redo of a function though, it seems to be getting more and more useful the more we think about it. In the last day we've come up with all sorts of tricks that this enables. Trampolines are useful.

While the question originally asked why return/redo did not evaluate blocks, there were also formulations like: "is cool because you can do things like tail call optimization", "[can write] a wrapper for the return functionality", "it seems to be getting more and more useful the more we think about it".
I do not think these are true. My first example demonstrates a case where return/redo can really be used, an example being in the "area of expertise" of return/redo, so to speak. It is a variadic sum function called sumn:
use [result collect process] [
collect: func [:value [any-type!]] [
unless value? 'value [return process result]
append/only result :value
return/redo :collect
]
process: func [block [block!] /local result] [
result: 0
foreach value reduce block [result: result + value]
result
]
sumn: func [] [
result: copy []
return/redo :collect
]
]
This is the usage example:
>> sumn 1 * 2 2 * 3 4
== 12
Variadic functions taking "unlimited number" of arguments are not as useful in Rebol as it may look at the first sight. For example, if we wanted to use the sumn function in a small script, we would have to wrap it into a paren to indicate where it should stop collecting arguments:
result: (sumn 1 * 2 2 * 3 4)
print result
This is not any better than using a more standard (non-variadic) alternative called e.g. block-sum and taking just one argument, a block. The usage would be like
result: block-sum [1 * 2 2 * 3 4]
print result
Of course, if the function can somehow detect what is its last argument without needing enclosing paren, we really gain something. In this case we could use the #[unset!] value as the sumn stopping argument, but that does not spare typing either:
result: sumn 1 * 2 2 * 3 4 #[unset!]
print result
Seeing the example of a return wrapper I would say that return/redo is not well suited for return wrappers, return wrappers being outside of its area of expertise. To demonstrate that, here is a return wrapper written in Rebol 2 that actually is outside of return/redo's area of expertise:
myreturn: func [
{my RETURN wrapper returning the string "indefinite" instead of #[unset!]}
; the [throw] attribute makes this function a RETURN wrapper in R2:
[throw]
value [any-type!] {the value to return}
] [
either value? 'value [return :value] [return "indefinite"]
]
Testing in R2:
>> do does [return #[unset!]]
>> do does [myreturn #[unset!]]
== "indefinite"
>> do does [return 1]
== 1
>> do does [myreturn 1]
== 1
>> do does [return 2 3]
== 2
>> do does [myreturn 2 3]
== 2
Also, I do not think it is true that return/redo helps with tail call optimizations. There are examples how tail calls can be implemented without using return/redo at the www.rebol.org site. As said, return/redo was tailor-made to support implementation of variadic functions and it is not flexible enough for other purposes as far as argument passing is concerned.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Reactor - Publish behavior - project-reactor

Related

Flux.switchOnNext variant that switches when the next publisher _emits_ rather than when it _is emitted_

Combine framework third `collect` method

How do I queue work across multiple publishers in Reactor 3?

Is there any way to lock variable between different process in Erlang

Why does return/redo evaluate result functions in the calling context, but block results are not evaluated?

Categories

Resources