Flink: how to control the execution sequence of task

Flink: how to control the execution sequence of task - task

I am trying to write a Junit test which has two resources and operators, the first operator is used to persist the data into the state, then the second operator will retrieve state out.
I found the wired thing is the second operator is always run before the first one which persists the data into the state.
So, my question is how to control the execution sequence of the task within the same flink job program?

Related

How is exactly-once processing maintained during worker failures or bundle retries?

I have a pipeline running on Dataflow that ingests files containing several thousand records. These files arrive at a steady frequency, which are processed by a stateful ParDo with timers that attempts to throttle the rate of ingest by batching and holding these files until the timer fires, before being expanded into individual record elements via a file processing ParDo, and finally written to BigQuery destinations.
On occasion, either an intermittent event such as an OOM event or autoscaling events, I have seen Dataflow attempting to emit the files in the stateful ParDo after the event resolves, causing duplicate record elements downstream when the file processing ParDo reprocesses the files. I understand that bundles are retried if there is a failure, but do they account for duplicates?
How/What is exactly-once processing achieving in this context, especially with regard to the State/Timer API, since I am seeing duplicates at my destination?

Dataflow achieves exactly once processing by ensuring that data produced from failing workers is not passed downstream (or, more precisely, if work is retried only one successful result is consumed downstream). For example, if stage A of your pipeline is producing elements and stage B is counting them, and workers in stage A fail and are re-tried, duplicate elements will not be counted by stage B (though of course stage B might itself have to be retried). This also applies to state and timers--a given bundle of work is either committed in its entirety (i.e. the set of inputs are marked as consumed, and the set of outputs committed atomically with the consumption/setting of state and timers) or entirely discarded (state/timers is left unconsumed/untouched and the retry will not not be influenced by what happened before.)
What is not exactly once is interactions with external systems (due to the possibility of retries). These are instead at least once, and so to guarantee correctness all such interactions should be idempotent. Sinks often achieve this by assigning a unique id such that multiple writes can be deduplicated in the downstream system. For files, one can write to temporary files, and then rename the "winning" set of shards to the final destination after a barrier. It's not clear from your question what files you're emitting (or ingesting) but hopefully this should be helpful in understanding how the system works.
More specifically, say the initial state is {state: A, timers: [X, Y], inputs: [i, j, k]}. Suppose further that when processing the bundle (these timers and inputs) the state is updated to B, we emit elements m, and n downstream, and we set a timer W.
If the bundle succeeds, the new state will be {state: B, timers: [W], inputs: []} and the elements [m, n] are guaranteed to be passed downstream. Furthermore, any competing retry of this bundle would always fail.
On the other hand, if the bundle fails (even if it "emitted" some of the elements or tried to update the state) the resulting state of the system will be {state: A, timers: [X, Y], inputs: [i, j, k]} for a fresh retry and nothing that was emitted from this failed bundle will be observed downstream.
Another way to look at it is that the set {inputs consumed, timers consumed, state modifications, timers set, outputs to produce downstream} is written to the backing "database" in a single transaction. Only a single successful attempt is ever committed, failed attempts are discarded.
More details can be found at https://beam.apache.org/documentation/runtime/model/

Apache Beam: read from UnboundedSource with fixed windows

I have an UnboundedSource that generates N items (it's not in batch mode, it's a stream -- one that only generates a certain amount of items and then stops emitting new items but a stream nonetheless). Then I apply a certain PTransform to the collection I'm getting from that source. I also apply the Window.into(FixedWindows.of(...)) transform and then group the results by window using Combine. So it's kind of like this:
pipeline.apply(Read.from(new SomeUnboundedSource(...)) // extends UnboundedSource
.apply(Window.into(FixedWindows.of(Duration.millis(5000))))
.apply(new SomeTransform())
.apply(Combine.globally(new SomeCombineFn()).withoutDefaults())
And I assumed that would mean new events are generated for 5 seconds, then SomeTransform is applied to the data in the 5 seconds window, then a new set of data is polled and therefore generated. Instead all N events are generated first, and only after that is SomeTransform applied to the data (but the windowing works as expected). Is it supposed to work like this? Does Beam and/or the runner (I'm using the Flink runner but the Direct runner seems to exhibit the same behavior) have some sort of queue where it stores items before passing it on to the next operator? Does that depend on what kind of UnboundedSource is used? In my case it's a generator of sorts. Is there a way to achieve the behavior that I expected or is it unreasonable? I am very new to working with streaming pipelines in general, let alone Beam. I assume, however, it would be somewhat illogical to try to read everything from the source first, seeing as it's, you know, unbounded.

An important thing to note is that windows in Beam operate on event time, not processing time. Adding 5 second windows to your data is not a way to prescribe how the data should be processed, only the end result of aggregations for that processing. Further, windows only affect the data once an aggregation is reached, like your Combine.globally. Until that point in your pipeline the windowing you applied has no effect.
As to whether it is supposed to work that way, the beam model doesn't specify any specific processing behavior so other runners may process elements slightly differently. However, this is still a correct implementation. It isn't trying to read everything from the source; generally streaming sources in Beam will attempt to read all elements available before moving on and coming back to the source later. If you were to adjust your stream to stream in elements slowly over a long period of time you will likely see more processing in between reading from the source.

Continue simulation without setting the output port of a LeafSystem

I have a LeafSystem (using pydrake) with a couple of inputs and an output which is calculated from the inputs. The CalcOutput callback function blocks execution of the program until the output is set. In some cases, I prefer to not set the output even if there is an input (eg.out of the limit values).
Is there a way to continue execution, without setting the output?

Drake System's framework uses a "pull" architecture. All of the system evaluations happen in a single thread, and CalcOutput is only called when a downstream method is evaluated which requests an input (e.g. in a downstream CalcOutput or CalcTimeDerivatives). So you need to return some value.
I guess that instead of returning some null value, you probably want to just have the output port continue to have the value it output last time? In that case, the solution is to store the output in a state variable (which means moving the work in CalcOutput into a state update), and then having your output just write the state variable out to the port.

When Dask tasks run multiple times, which result is used?

First, read this question:
Repeated task execution using the distributed Dask scheduler
Now, when Dask decides to rerun a task due to worker stealing or a task failing (as a result of memory limits per process for example), which task result gets passed to the next node of the DAG? We are using nested tasks, e.g.
#dask.delayed
def add(n):
return n+1
t_a = add(1)
t_b = add(t_a)
the_output = add(add(add(t_b)))
So if one of these tasks fails, or gets stolen, and is run twice, which result gets passed to the next node in the DAG?
Further background for those interested:
The reason this has come up is that our task writes to a database. If it runs twice, we get an integrity error because it is trying to insert the same record twice (constrained on id and version in combination). The current plan is to make the task idempotent by catching the integrity error in the task but I still don't understand how Dask "chooses" a result.

If you have a situation like add(add(add(t_b)))
Or more generally
x = add(1)
y = add(x)
z = add(y)
Even though those all use the same function, they are all separate tasks. Dask sees that they have different inputs and so it treats them differently.
So if one of these tasks fails, or gets stolen, and is run twice, which result gets passed to the next node in the DAG?
In all of these cases, there is only one valid result on the cluster at once. A stolen task is only run on the new machine, not the old one. If the result of a task is lost and has to be rerun then only the new value will be present anywhere (the old value was lost, remember).

Serializing Flux in Reactor

Is possible to serialize Reactor Flux. For example my Flux is in some state and is currently processing some event. And suddenly service is terminated. Current state of Flux is saved to database or to file. And then on restart of aplication I just take all Flux from that file/table and subscribe them to restart processing from last state. This is possible in reactor?

No, this is not possible. Flux are not serializable and are closer to a chain of functions, they don't necessarily have a state[1] but describe what to do given an input (provided by an initial generating Flux)...
So in order to "restart" a Flux, you'd have to actually create a new one that gets fed the remaining input the original one would have received upon service termination.
Thus it would be more up to the source of your data to save the last emitted state and allow restarting a new Flux sequence from there.
[1] Although, depending on what operators you chained in, you could have it impact some external state. In that case things will get more complicated, as you'll have to also persist that state.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart