What's the difference between a watermark and a trigger in Flink? - stream

I read that, "..The ordering operator has to buffer all elements it receives. Then, when it receives a watermark it can sort all elements that have a timestamp that is lower than the watermark and emit them in the sorted order. This is correct because the watermark signals that not more elements can arrive that would be intermixed with the sorted elements..." - https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams
Hence, it seems that the watermark serves as a signal to the following operator, for beginning processing. I guess, that's what also a Trigger does. What's the difference between the two?

You can think of watermarks as special records that tell an operator what (event-) time it is. When an operator receives a watermark, it compares the watermark with its current time and other watermarks it received from different stream partitions. Depending on the comparison, the operator advances its own clock.
Some operators register timers (windows, time-based joins, custom implementations). An operator triggers a timer when the clock of the operator passes the time for which the timer was registered.
So, watermarks and timers are two different things. Watermarks tell an operator what time it is and the operator triggers a timer at the right point in time.

A Watermark can be thought of as an assertion that an event time stream is now complete up to a particular timestamp. When a Watermark is processed by an operator it will cause the firing of any relevant event time timers. The operators that use EventTimeTimers are EventTimeWindows and ProcessFunctions.
Triggers are part of the window API and define when Windows will produce results. An EventTimeTrigger wraps around an event time timer that is called when an suitably large Watermark is processed, indicating that the window is now complete.

Related

Streaming Beam pipeline with "lookbehind"

I am new to Beam/Dataflow and am trying to figure out if it is suited to this problem. I am trying to keep a running sum of which types of messages are currently backlogged in a queueing system. The system uses a monotonically increasing offset number to order messages: producers learn the number when the send a message, and consumers track the watermark offset as they process each message in FIFO order. This pipeline would have two inputs: counts from the producers and watermarks from the consumers.
The queue producer would regularly flush a batch of count metrics to Beam:
(type1, offset, count)
(type2, offset, count)
...
where the offset was the last offset the producer wrote for typeN, and count is how many typeN messages it enqueued in the current batch period.
The queue consumer will regularly send its latest consumed watermark offset. The effect this should have is to invalidate any counts that have an offset lower than this consumer watermark.
The output of the pipeline is the sum of all counts with a higher offset than the largest consumer watermark yet seen, grouped by message type. (snapshotted every 5 minutes or so.)
(Of course there would be 100k message "types", hundreds of producer servers, occasional 2-hour periods where the consumer doesn't report an advancing watermark, etc.)
Is this doable? That this pipeline would need to maintain and scan an unbounded-ish history of count records is the part that seems maybe unsuited to Beam.
One possible approach would be to model this as two timeseries (left , right) where you want to match left.timestamp <= right.timestamp. You can do this using the State and Timer API.
In order to achieve this unbounded, you will need to be working within a GlobalWindow. Important note in the Global Window there is no expiry of the state, so you will need to make sure to do Garbage Collection on your left and right streams. Also data will arrive in the onprocess unordered, so you will need to make use of Event Time timers to do the actual work.
Very roughly:
onProcess(){
Store data in BagState.
Setup Event time timer to go off
}
OnTimer(){
Do your buiss logic.
}
This is a lot easier with Apache Beam > 2.24.0 as OrderedListState has been added.
Although the timeseries use case is different from the one in this question, this talk from the 2019 Beam summit also has some pointers (but does not make use of OrderedListState, which was not available at the time);
State and Timer API and Timeseries

Combine session and tumbling window: time windows that are aligned to the first event for each key

i read about flink`s window assigners over here: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html#window-assigners , but i cant find any solution for my problem.
as part of my project i need a windowing that the timer will start given the first element of the key and will be closed and set ready for processing after X minutes. for example:
first keyA comes at (hh:mm:ss) 00:00:02, i want all keyA will be windowing until 00:01:02, and then the timer of 1 minutes will start again only when keyA will be given as input.
Is it possible to do something like that in flink? is there a workaround?
hope i made it clear enough.
Implementing keyed windows that are aligned with the first event, rather than with the epoch, is quite difficult, in general, which I believe is why this isn't supported by Flink's window API. The problem is that with an out-of-order stream using event time processing, as earlier events arrive you may need to revise your notion of when the window began, and when it should end. For example, if the first keyA arrives at 00:00:02, but then some time later an event with keyA arrives with a timestamp of 00:00:01, now suddenly the window should end at 00:01:01, rather than 00:01:02. And if the out-of-orderness is large compared to the window length, handling this becomes quite complex -- imagine, for example, that the event from 00:00:01 arrives 2 minutes after the event from 00:00:02.
Rather than trying to implement this with the window API, I would use a KeyedProcessFunction. If you only need to support processing time windows, then these concerns about out-of-orderness do not apply, and the solution can be fairly simple. It suffices to keep one object in keyed state, which might be a list holding all of the events in the window, or a counter or other aggregator, depending on what you're trying to accomplish.
When an event arrives, if the state (for this key) is null, then there is no open window for this key. Initialize the state (i.e., create a new, empty list, or set the counter to zero), and create a Timer to fire at the appropriate time. Then regardless of whether the state had been null, add the incoming event to the state (i.e., append it to the list, or increment the counter).
When the timer fires, emit the window's result and reset the state to null.
If, on the other hand, you want to do this with event time windows, first sort the stream and then use the same approach. Note that you won't be able to handle late events, so plan your watermarking accordingly (reducing the likelihood of late events to a manageable level), or go for a more complex implementation.

Calculating periodic checkpoints for an unbounded stream in Apache Beam/DataFlow

I am using a global unbounded stream in combination with Stateful processing and timers in order to totally order a stream per key by event timestamp. The solution is described with the answer to this question:
Processing Total Ordering of Events By Key using Apache Beam
In order to restart the pipeline after a failure or stopping for some other reason, I need to determine the lowest event timestamp at which we are guaranteed that all other events have been processed downstream. This timestamp can be calculated periodically and persisted to a datastore and used as the input to the source IO (Kinesis) so that the stream can be re-read without having to go back to the beginning. (It is ok for us to have events replayed)
I considered having the stateful transformation emit the lowest processed timestamp as the output when the timer triggers and then combine all the outputs globally to find the minimum value. However, it is not possible to use a Global combine operation because a either a Window or a Trigger must be applied first.
Assuming that my stateful transform emits a Long when the timer fires which represents the smallest timestamp, I am defining the pipeline like this:
p.apply(events)
.apply("StatefulTransform", ParDo.of(new StatefulTransform()))
.apply(Window.<Long>configure().triggering(Repeatedly.forever(AfterFirst.of(
AfterPane.elementCountAtLeast(100),
AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(1))))))
.apply(Combine.globally(new MinLongFn()))
.apply("WriteCheckpoint", ParDo.of(new WriteCheckpoint()));
Will this ensure that the checkpoints will only be written when all of the parallel workers have emitted at least one of their panes? I am concerned that a the combine operation may operate on panes from only some of the workers, e.g. there may be a worker that has either failed or is still waiting for another event to trigger it's timer.
I'm a newbie of the Beam, but according to this blog https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html, Splittable DoFn might be the thing you are looking for!
You could create an SDF to fetch the stream and accept the input element as the start point.

Total aggregate over an unbounded stream in Dataflow

A number of examples show aggregation over windows of an unbounded stream, but suppose we need to get a count-per-key of the entire stream seen up to some point in time. (Think word count that emits totals for everything seen so far rather than totals for each window.)
It seems like this could be a Combine.perKey and a trigger to emit panes at some interval. In this case the window is essentially global, and we emit panes for that same window throughout the life of the job. Is this safe/reasonable, or perhaps there is another way to compute a rolling, total aggregate?
Ryan your solution of using a global window and a periodic trigger is the recommended approach. Just make sure you use accumulation mode on the trigger and not discarding mode. The Triggers page should have more information.
Let us know if you need additional help.

Questions about the nextTuple method in the Spout of Storm stream processing

I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system.
Then I checked some material about the ack and nextTuple methods of Storm. Now my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method?
If this is true, does it means that I cannot set a fixed time interval to emit data?
Thx a lot!
My experience has been that you should not expect Storm to make any real-time guarantees, including in your case the rate of tuple processing. You can certainly write a spout that only emits tuples on some time schedule, but Storm can't really guarantee that it will always call on the spout as often as you would like.
Note that nextTuple should be called whenever there is room available for more pending tuples in the topology. If the topology has free capacity, I would expect Storm to try to fill it up if it can with whatever it can get.
I had a similar use-case, and the way I accomplished it is by using TICK_TUPLE
Config tickConfig = new Config();
tickConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 15);
...
...
builder.setBolt("storage_bolt", new S3Bolt(), 4).fieldsGrouping("shuffle_bolt", new Fields("hash")).addConfigurations(tickConfig);
Then in my storage_bolt (note it's written in python, but you will get an idea) i check if message is tick_tuple if it is then execute my code:
def process(self, tup):
if tup.stream == '__tick':
# Your logic that need to be executed every 15 seconds,
# or what ever you specified in tickConfig.
# NOTE: the maximum time is 600 s.
storm.ack(tup)
return

Resources