Dart: difference between Future.value vs Future.microtask - dart

What's difference between Future.value vs Future.microtask
Case1:
Future.microtask(() => 1).then(print);
Future.microtask(() => Future(() => 2)).then(print);
Future.value(3).then(print);
Future.value(Future(() => 4)).then(print);
Output for this is:
1
3
4
2
Case2: And when I swaps statements
Future.value(3).then(print);
Future.value(Future(() => 4)).then(print);
Future.microtask(() => 1).then(print);
Future.microtask(() => Future(() => 2)).then(print);
output is:
3
1
4
2
Questions:
What's difference between Future.value vs Future.microtask?
Which of two has more priority? Whether Future.value completes first or Future.microtask?
Why the order of the final output (4 and 2) remains unchanged?
Can someone explain this behavior considering event and microtask queue?

Future.microtask schedules a microtask to execute the argument function. It then completes the future with the result of that function call.
Future() and Future.delayed schedules a timer task, the former with Duration.zero, to execute a function, and complete the future with the result of that function call.
Future.value takes a value, not a function to call. If you do Future.value(computation()), the computation is performed (or at least started, in case it's async) right now.
If you do Future.microtask(computation), the computation is started in a later microtask.
In each case, if the function returns a future or the value passed to Future.value is a future, then you'll also have to wait for that future to complete, before the future returned by the Future constructor is completed with the same result.
For the concrete example:
Future.value(3).then(print);
This creates a future completed with the value 3.
However, since futures promise to not call a callback, like then(print), immediately when the then is called, it schedules a microtask to actually call the print callback at a later time. So, you get an extra delay there.
In more detail:
Future.microtask(() => 1).then(print);
// This `Future.microtask(...)` creates future, call it F1,
// and schedules a microtask M1 to call `() => 1` later.
// Then adds callback C1 (`then(print)`) to F1, but F1 isn't completed yet,
// so nothing further happens.
Future.microtask(() => Future(() => 2)).then(print);
// Creates future F2 (`Future.microtask(...)`),
// schedules microtask M2 to run `() => Future(() => 2)` later,
// then callback C2 (`.then(print)`) to F2.
Future.value(3).then(print);
// Creates future F3 with value 3. Adds C3 (`then(print)`) to F3.
// Since F3 is complete, it schedules M3 to invoke C3.
Future.value(Future(() => 4)).then(print);
// Creates future F4 (`Future(() => 4)`)
// which starts *timer* T1 with duration zero to run `() => 4`.
// Then creates future F5 (`Future.value(...)`) with "value" F4.
// Completing with a future adds a callback C4 to F4,
// to notify F5 when a result is ready.
// Then adds callback C5 (`then(print)`) to F5.
That's what happens immediately. Then the event/microtask loop takes over.
Eventually M1 runs. This executes () => 1 to the value 1.
Then F1 is completed with the value 1.
Then F1 notifies all its existing callbacks, which invokes C1 with 1.
Which prints "1".
Then M2 runs. This evaluates Future(() => 2).
That creates future F6 (Future(...) and a timer T2 with duration zero.
It then completes F2 with the future F6,
which means adding a callback C6 to F6 to notify F2 of a result.
Then M3 runs. This invokes C3 with the value 3.
Which prints "3".
Now all microtasks are done.
Timer T1 runs which evaluates () => 4 to 4.
F4 completes with the value 4.
F4 calls its existing callbacks, C4 with 4.
That completes F5 with the value 4,
and calls its existing callback C5 with the value 4.
Which prints "4".
Timer T2 runs () => 2 and completes F6 with the value 2.
This runs F6's existing callback C6 with the value 2.
That callback completes F2 with the value 2,
and it calls F2's existing callback C2 with the value 2
Which prints "2".
So, three microtasks, two timers, and some future result propagation later, you get the result you see.
The second example can be done in the same way:
Future.value(3).then(print);
// Schedule a microtask to print 3.
Future.value(Future(() => 4)).then(print);
// Schedule a timer to (going through an extra future) print 4.
Future.microtask(() => 1).then(print);
// Schedule a microtask to compute and print 1.
Future.microtask(() => Future(() => 2)).then(print);
// Schedule a microtask to schedule a timer to eventually print 2.
The microtask-only ones, 3 and 1, should print first in order.
Then it should print 4, and then 2, because the 2-timer is scheduled after the 4-timer. 3-1-4-2, which is what you see.

Related

Invoke concurrent functions sequentially

Overview
I have a function f1 which is non-async function.
f1 gets called multiple times and I have no control over the calling of f1
When f1 gets called I would like to invoke an async function f2
Aim:
I would like f2 to complete before the next f2 executes
Question:
How can I ensure f2 executes in sequence? (sample code below)
//I have no control over f1
//f1 can get called multiple times in quick succession.
func f1() {
Task {
try await f2() // Next time f1 gets called it should wait for the previous f2 to complete then should execute f2
}
}
func f2() async throws {}

Java RX's Flux.merge and switchIfEmpty

I am trying to understand how JavaRx's Flux.merge and switchIfEmpty work together in regards to the below code as I am a bit confused on results I am seeing which is no doubt the result of my not fully grasping Java RX.
My question is ... If the call to wOneRepository... returns an empty list or the call to wTwoRepository... returns an empty list, will the switchIfEmpty code get executed? Or will it only get executed if both calls return an empty list?
Flux<Widget> f1 = wOneRepository.findWidgets(email).onErrorResume(error -> Flux.empty());
Flux<Widget> f2 = wTwoRepository.findWidgets(email).onErrorResume(error -> Flux.empty());
return Flux.merge(f1, f2)
.switchIfEmpty(Flux.error(new IllegalArgumentException("Widgets were not found")));
Thank you
switchIfEmpty() will only be called if the upstream Flux completes without emitting anything, and that will only happen if both f1 and f2 complete without emitting anything. So, if both findWidget calls fail, or both return empty Flux instances, or some combination of those, then switchIfEmpty will be called. If either f1 or f2 emits a Widget, then that Widget will be emitted from the merge operator, which means switchIfEmpty will not be called.

Apache Beam - Sliding Windows Only Emit Earliest Active Window

I'm trying to use Apache Beam (via Scio) to run a continuous aggregation of the last 3 days of data (processing time) from a streaming source and output results from the earliest, active window every 5 minutes. Earliest meaning the window with the earliest start time, active meaning that the end of the window hasn't yet passed. Essentially I'm trying to get a 'rolling' aggregation by dropping the non-overlapping period between sliding windows.
A visualization of what I'm trying to accomplish with an example sliding window of size 3 days and period 1 day:
early firing - ^ no firing - x
|
** stop firing from this window once time passes this point
^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | | ** stop firing from this window once time passes this point
w1: +====================+^ ^ ^
x x x x x x x | | |
w2: +====================+^ ^ ^
x x x x x x x | | |
w3: +====================+
time: ----d1-----d2-----d3-----d4-----d5-----d6-----d7---->
I've tried using sliding windows (size=3 days, period=5 min), but they produce a new window for every 3 days/5 min combination in the future and are emitting early results for every window. I tried using trigger = AfterWatermark.pastEndOfWindow(), but I need early results when the job first starts. I've tried comparing the pane data (isLast, timestamp, etc.) between windows but they seem identical.
My most recent attempt, which seems somewhat of a hack, included attaching window information to each key in a DoFn, re-windowing into a fixed window, and attempting to group and reduce to the oldest window from the attached data, but the final reduceByKey doesn't seem to output anything.
DoFn to attach window information
// ValueType is just a case class I'm using for objects
type DoFnT = DoFn[KV[String, ValueType], KV[String, (ValueType, Instant)]]
class Test extends DoFnT {
// Window.toString looks like the following:
// [2020-05-16T23:57:00.000Z..2020-05-17T00:02:00.000Z)
def parseWindow(window: String): Instant = {
Instant.parse(
window
.stripPrefix("[")
.stripSuffix(")")
.split("\\.\\.")(1))
}
#ProcessElement
def process(
context: DoFnT#ProcessContext,
window: BoundedWindow): Unit = {
context.output(
KV.of(
context.element().getKey,
(context.element().getValue, parseWindow(window.toString))
)
)
}
}
sc
.pubsubSubscription(...)
.keyBy(_.key)
.withSlidingWindows(
size = Duration.standardDays(3),
period = Duration.standardMinutes(5),
options = WindowOptions(
accumulationMode = DISCARDING_FIRED_PANES,
allowedLateness = Duration.ZERO,
trigger = Repeatedly.forever(
AfterWatermark.pastEndOfWindow()
.withEarlyFirings(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(1)))))))
.reduceByKey(ValueType.combineFunction())
.applyPerKeyDoFn(new Test())
.withFixedWindows(
duration = Duration.standardMinutes(5),
options = WindowOptions(
accumulationMode = DISCARDING_FIRED_PANES,
trigger = AfterWatermark.pastEndOfWindow(),
allowedLateness = Duration.ZERO))
.reduceByKey((x, y) => if (x._2.isBefore(y._2)) x else y)
.saveAsCustomOutput(
TextIO.write()...
)
Any suggestions?
First, regarding processing time: If you want to window according to processing time, you should set your event time to the processing time. This is perfectly fine - it means that the event you are processing is the event of ingesting the record, not the event that the record represents.
Now you can use sliding windows off-the-shelf to get the aggregation you want, grouped the way you want.
But you are correct that it is a bit of a headache to trigger the way you want. Triggers are not easily expressive enough to say "output the last 3 day aggregation but only begin when the window is 5 minutes from over" and even less able to express "for the first 3 day period from pipeline startup, output the whole time".
I believe a stateful ParDo(DoFn) will be your best choice. State is partitioned per key and window. Since you want to have interactions across 3 day aggregations you will need to run your DoFn in the global window and manage the partitioning of the aggregations yourself. You tagged your question google-cloud-dataflow and Dataflow does not support MapState so you will need to use a ValueState that holds a map of the active 3 day aggregations, starting new aggregations as needed and removing old ones when they are done. Separately, you can easily track the aggregation from which you want to periodically output, and have a timer callback that periodically emits the active aggregation. Something like the following pseudo-Java; you can translate to Scala and insert your own types:
DoFn<> {
#StateId("activePeriod") StateSpec<ValueState<Period>> activePeriod = StateSpecs.value();
#StateId("accumulators") StateSpec<ValueState<Map<Period, Accumulator>>> accumulators = StateSpecs.value();
#TimerId("nextPeriod") TimerSpec nextPeriod = TimerSpecs.timer(TimeDomain.EVENT_TIME);
#TimerId("output") TimerSpec outputTimer = TimerSpecs.timer(TimeDomain.EVENT_TIME);
#ProcessElement
public void process(
#Element element,
#TimerId("nextPeriod") Timer nextPeriod,
#TimerId("output") Timer output,
#StateId("activePeriod") ValueState<Period> activePeriod
#StateId("accumulators") ValueState<Map<Period, Accumulator>> accumulators) {
// Set nextPeriod if it isn't already running
// Set output if it isn't already running
// Set activePeriod if it isn't already set
// Add the element to the appropriate accumulator
}
#OnTimer("nextPeriod")
public void onNextPeriod(
#TimerId("nextPeriod") Timer nextPeriod,
#StateId("activePriod") ValueState<Period> activePeriod {
// Set activePeriod to the next one
// Clear the period we will never read again
// Reset the timer (there's a one-time change in this logic after the first window; add a flag for this)
}
#OnTimer("output")
public void onOutput(
#TimerId("output") Timer output,
#StateId("activePriod") ValueState<Period> activePeriod,
#StateId("accumulators") ValueState<MapState<Period, Accumulator>> {
// Output the current accumulator for the active period
// Reset the timer
}
}
I do have some reservations about this, because the outputs we are working so hard to suppress are not comparable to the outputs that are "replacing" them. I would be interesting in learning more about the use case. It is possible there is a more straightforward way to express the result you are interested in.

Confusion about lua corountine's resume and yield function

I am learning lua by this video tutorial, it has this piece of code:
co = coroutine.create(function()
for i=1,5 do
print(coroutine.yield(i))
end
end)
print(coroutine.resume(co,1,2))
print(coroutine.resume(co,3,4))
print(coroutine.resume(co,5,6))
print(coroutine.resume(co,7,8))
print(coroutine.resume(co,9,10))
print(coroutine.resume(co,11,12))
The output is like this:
true 1
3 4
true 2
5 6
true 3
7 8
true 4
9 10
true 5
11 12
true
But I don't understand how yield and resume passes parameters to each other and why yield doesn't output the first 1,2 that resume passes to it, could someone please explain? Thanks
Normal Lua functions have one entry (where arguments get passed in) and one exit (where return values are passed out):
local function f( a, b )
print( "arguments", a, b )
return "I'm", "done"
end
print( "f returned", f( 1, 2 ) )
--> arguments 1 2
--> f returned I'm done
The arguments are bound to the parameter names (local variables) by putting them inside the parentheses, the return values listed as part of the return statement can be retrieved by putting the function call expression in the right-hand side of an assignment statement, or inside of a larger expression (e.g. another function call).
There are alternative ways to call a function. E.g. pcall() calls a function and catches any runtime errors that may be raised inside. Arguments are passed in by putting them as arguments into the pcall() function call (right after the function itself). pcall() also prepends an additional return value that indicates whether the function exited normally or via an error. The inside of the called function is unchanged.
print( "f returned", pcall( f, 1, 2 ) )
--> arguments 1 2
--> f returned true I'm done
You can call the main function of a coroutine by using coroutine.resume() instead of pcall(). The way the arguments are passed, and the extra return value stays the same:
local th = coroutine.create( f )
print( "f returns", coroutine.resume( th, 1, 2 ) )
--> arguments 1 2
--> f returns true I'm done
But with coroutines you get another way to (temporarily) exit the function: coroutine.yield(). You can pass values out via coroutine.yield() by putting them as arguments into the yield() function call. Those values can be retrieved outside as return values of the coroutine.resume() call instead of the normal return values.
However, you can re-enter the yielded coroutine by again calling coroutine.resume(). The coroutine continues where it left off, and the extra values passed to coroutine.resume() are available as return values of the yield() function call that suspended the coroutine before.
local function g( a, b )
print( "arguments", a, b )
local c, d = coroutine.yield( "a" )
print( "yield returned", c, d )
return "I'm", "done"
end
local th = coroutine.create( g )
print( "g yielded", coroutine.resume( th, 1, 2 ) )
print( "g returned", coroutine.resume( th, 3, 4 ) )
--> arguments 1 2
--> g yielded true a
--> yield returned 3 4
--> g returned true I'm done
Note that the yield need not be directly in the main function of the coroutine, it can be in a nested function call. The execution jumps back to the coroutine.resume() that (re-)started the coroutine in the first place.
Now to your question why the 1, 2 from the first resume() doesn't appear in your output: Your coroutine main function doesn't list any parameters and so ignores all arguments that are passed to it (on first function entry). On a similar note, since your main function doesn't return any return values, the last resume() doesn't return any extra return values besides the true that indicates successful execution as well.
co = coroutine.create(function()
for i=1,5 do
print(coroutine.yield(i))
end
end)
We start the coroutine the first time using:
print(coroutine.resume(co,1,2))
it will run until the first yield. our first resume call will return true and the parameters of yield (here i = 1) which explains the first output line.
our coroutine is now suspended. once we call resume a second time:
print(coroutine.resume(co,3,4))
your first yield finally returns and the parameters of your current resume (3,4) will be printed. the for loops second iteration begins, coroutine.yield(2) is called, supending the coroutine which again will make your last resume return true, 2 and so on
So actually in your example coroutine.resume(co) would be sufficient for the first call as any further arguments are lost anyway.
The reason we see this behavior is subtle, but it has to do with a mismatch of "entering" and "exiting" yield statements. It also has to do with the order in which print and yield are called within your anonymous function.
Let's imagine a graph of the execution of print(coroutine.yield(i)) vs. iteration.
On the first iteration, we have coroutine.resume pass 1 and 2 to the coroutine. This is the origin point, so we are not picking up from a former yield, but rather the original call of the anonymous function itself. The yield is called inside the print, returning i=1 but leaving print uncalled. The function exits.
What follows is the suspension of the function for a time before we see resumption by the next coroutine.resume.This resume passes 3 and 4. The function picks up at the last yield. Remember that the print function was left uncalled as the 1st yield was called first and exited the program? Well, execution returns inside the print, and so print now gets called, but this time returning 3 and 4 as these were the latest values to be transferred over. The function repeats again, calling yield before print, returning i=2.
If we go down the iterations we will more definitely the pattern underlying why we didn't see the 1 and 2. Our first iteration was an "exiting" yield unpaired with a corresponding "entering yield." This corresponds to the 1st time execution of the coroutine co.
We might expect the last yield to also go unpaired, but the difference is that we'll have an "entering" yield which is unpaired as opposed to an "exiting" yield which is unpaired. This is because the loop would have already finished. This explains why we see 11 12 followed by true followed with no "exiting" yield return.
This sort of situation is independent of the parity (even/oddness) of the for loop inside. What only matters are the resume and yield pairs and the manner in which they are handled. You have to appreciate that yield won't return a value within the function on the first call of resume which is being used to call the function inside the coroutine in the first place.

Lua: lua_resume and lua_yield argument purposes

What is the purpose of passing arguments to lua_resume and lua_yield?
I understand that on the first call to lua_resume the arguments are passed to the lua function that is being resumed. This makes sense. However I'd expect that all subsequent calls to lua_resume would "update" the arguments in the coroutine's function. However that's not the case.
What is the purpose of passing arguments to lua_resume for lua_yield to return? Can the lua function running under the coroutine have access to the arguments passed by lua_resume?
What Nicol said. You can still preserve the values from the first resume call if you want:
do
local firstcall
function willyield(a)
firstcall = a
while a do
print(a, firstcall)
a = coroutine.yield()
end
end
end
local coro = coroutine.create(willyield)
coroutine.resume(coro, 1)
coroutine.resume(coro, 10)
coroutine.resume(coro, 100)
coroutine.resume(coro)
will print
1 1
10 1
100 1
Lua cannot magically give the original arguments new values. They might not even be on the stack anymore, depending on optimizations. Furthermore, there's no indication where the code was when it yielded, so it may not be able to see those arguments anymore. For example, if the coroutine called a function, that new function can't see the arguments passed into the old one.
coroutine.yield() returns the arguments passed to the resume call that continues the coroutine, so that the site of the yield call can handle parameters as it so desires. It allows the code doing the resuming to communicate with the specific code doing the yielding. yield() passes its arguments as return values from resume, and resume passes its arguments as return values to yield. This sets up a pathway of communication.
You can't do that in any other way. Certainly not by modifying arguments that may not be visible from the yield site. It's simple, elegant, and makes sense.
Also, it's considered exceedingly rude to go poking at someone's values. Especially a function already in operation. Remember: arguments are just local variables filled with values. The user shouldn't expect the contents of those variables to change unless it changes them itself. They're local variables, after all. They can only be changed locally; hence the name.
A simple example:
co = coroutine.create (function (a, b)
print("First args: ", a, b)
coroutine.yield(a+10, b+10)
print("Second args: ", a, b)
coroutine.yield(a+10, b+10)
end)
print(coroutine.resume(co, 1, 2))
print(coroutine.resume(co, 3, 4))
Prints:
First args: 1 2
true 11 12
Second args: 1 2
true 11 12
Showing that the orginal values for the args a and b did not change.

Resources