Related
For some time have been using create an EmitterProcessor with built in sink as follows:
EmitterProcessor<String> emitter = EmitterProcessor.create();
FluxSink<String> sink = emitter.sink(FluxSink.OverflowStrategy.LATEST);
The sink publishes using a Flux .from command
Flux<String> out = Flux
.from(emitter
.log(log.getName()));
and the sink can be passed around, and populated with strings, simply using the next instruction.
Now we see that EmitterProcessor is deprecated.
It's all replaced with Sinks.many() like this
Many<String> sink = Sinks.many().unicast().onBackpressureBuffer();
but how to use that to publish from?
The answer was casting the Sinks.many() to asFlux()
Flux<String> out = Flux
.from(sink.asFlux()
.log(log.getName()));
Also using that for cancel and termination of the flux
sink.asFlux().doOnCancel(() -> {
cancelSink(id, request);
});
/* Handle errors, eviction, expiration */
sink.asFlux().doOnTerminate(() -> {
disposeSink(id);
});
UPDATE The cancel and terminate don't appear to work per this question
I have an Apache Beam pipeline which runs on Google Cloud Dataflow. This a streaming pipeline which receives input messages from Google Cloud PubSub which are basically JSON arrays of elements to process.
Roughly speaking, the pipeline has these steps:
Deserializes the message into a PCollecttion<List<T>>.
Splits (or explodes) the array into a PCollection<T>.
Few processing steps: some elements will finish before other elements and some elements are cached so they simply skip to the end without much processing at all.
Flatten all outputs and apply a GroupByKey(this is the problem step): it transforms the PCollection back into a Pcollection<List<T>> but it doesn't wait for all the elements.
Serialize to publish a PubSub Message.
I cannot get the last GroupByKey to group all elements that where received together. The published message doesn't contain the elements that had to be processed and took longer than those which skipped to the end.
I think this would be straight forward to solve if I could write a custom Data-Driven trigger. Or even if I could dynamically set the trigger AfterPane.elementCountAtLeast() from a customized WindowFn.
It doesn't seem that I can make a custom trigger. But is it possible to somehow dynamically set the trigger for each window?
--
Here is a simplified version of the pipeline I am working on.
I have simplified the input from an array of objects T into a simple array of Integer. I have simulated the keys (or IDs) for these integers. Normally they would be part of the objects.
I also simplified the slow processing step (which really is several steps) into a sigle step with an artificial delay.
(complete example gist https://gist.github.com/naringas/bfc25bcf8e7aca69f74de719d75525f2 )
PCollection<String> queue = pipeline
.apply("ReadQueue", PubsubIO.readStrings().fromTopic(topic))
.apply(Window
.<String>into(FixedWindows.of(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.standardSeconds(3))
.triggering(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardSeconds(2)))
.discardingFiredPanes());
TupleTag<List<KV<Integer, Integer>>> tagDeserialized = new TupleTag<List<KV<Integer, Integer>>>() {};
TupleTag<Integer> tagDeserializeError = new TupleTag<Integer>() {};
PCollectionTuple imagesInputTuple = queue
.apply("DeserializeJSON", ParDo.of(new DeserializingFn()).withOutputTags(tagDeserialized, TupleTagList.of(tagDeserializeError)));
/*
This is where I think that I must adjust the custom window strategy, set the customized dynamic-trigger
*/
PCollection<KV<Integer, Integer>> images = imagesInputTuple.get(tagDeserialized)
/* I have tried many things
.apply(Window.<List<KV<Integer, Integer>>>into(new GlobalWindows()))
*/
.apply("Flatten into timestamp", ParDo.of(new DoFn<List<KV<Integer, Integer>>, KV<Integer, Integer>>() {
// Flatten and output into same ts
// like Flatten.Iterables() but I set the output window
#ProcessElement
public void processElement(#Element List<KV<Integer, Integer>> input, OutputReceiver<KV<Integer, Integer>> out, #Timestamp Instant ts, BoundedWindow w, PaneInfo p) {
Instant timestamp = w.maxTimestamp();
for (KV<Integer, Integer> el : input) {
out.outputWithTimestamp(el, timestamp);
}
}
}))
.apply(Window.<KV<Integer, Integer>>into(new GlobalWindows()));
TupleTag<KV<Integer, Integer>> tagProcess = new TupleTag<KV<Integer, Integer>>() {};
TupleTag<KV<Integer, Integer>> tagSkip = new TupleTag<KV<Integer, Integer>>() {};
PCollectionTuple preproc = images
.apply("PreProcessingStep", ParDo.of(new SkipOrNotDoFn()).withOutputTags(tagProcess, TupleTagList.of(tagSkip)));
TupleTag<KV<Integer, Integer>> tagProcessed = new TupleTag<KV<Integer, Integer>>() {};
TupleTag<KV<Integer, Integer>> tagError = new TupleTag<KV<Integer, Integer>>() {};
PCollectionTuple processed = preproc.get(tagProcess)
.apply("ProcessingStep", ParDo.of(new DummyDelasyDoFn).withOutputTags(tagProcessed, TupleTagList.of(tagError)));
/* Here, at the "end"
the elements get grouped back
first: join into a PcollectionList and flatten it
second: GroupByKey which should but doesn't way for all elements
lastly: serilize and publish (in this case just print out)
*/
PCollection end = PCollectionList.of(preproc.get(tagSkip)).and(processed.get(tagProcessed))
.apply("FlattenUpsert", Flatten.pCollections())
//
.apply("GroupByParentId", GroupByKey.create())
.apply("GroupedValues", Values.create())
.apply("PublishSerialize", ParDo.of(
new DoFn<Object, String>() {
#ProcessElement
public void processElement(ProcessContext pc) {
String output = GSON.toJson(pc.element());
LOG.info("DONE: {}", output);
pc.output(output);
}
}));
// "send the string to pubsub" goes here
I played around a little bit with stateful pipelines. As you'd like to use data-driven triggers or AfterPane.elementCountAtLeast() I assume you know the number of elements that conform the message (or, at least, it does not change per key) so I defined NUM_ELEMENTS = 10 in my case.
The main idea of my approach is to keep track of the number of elements that I have seen so far for a particular key. Notice that I had to merge the PreProcessingStep and ProcessingStep into a single one for an accurate count. I understand this is just a simplified example so I don't know how that would translate to the real scenario.
In the stateful ParDo I defined two state variables, one BagState with all integers seen and a ValueState to count the number of errors:
// A state bag holding all elements seen for that key
#StateId("elements_seen")
private final StateSpec<BagState<Integer>> elementSpec =
StateSpecs.bag();
// A state cell holding error count
#StateId("errors")
private final StateSpec<ValueState<Integer>> errorSpec =
StateSpecs.value(VarIntCoder.of());
Then we process each element as usual but we don't output anything yet unless it's an error. In that case we update the error counter before emitting the element to the tagError side output:
errors.write(firstNonNull(errors.read(), 0) + 1);
is_error = true;
output.get(tagError).output(input);
We update the count and, for successfully processed or skipped elements (i.e. !is_error), write the new observed element into the BagState:
int count = firstNonNull(Iterables.size(state.read()), 0) + firstNonNull(errors.read(), 0);
if (!is_error) {
state.add(input.getValue());
count += 1;
}
Then, if the sum of successfully processed elements and errors is equal to NUM_ELEMENTS (we are simulating a data-driven trigger here), we flush all the items from the BagState:
if (count >= NUM_ELEMENTS) {
Iterable<Integer> all_elements = state.read();
Integer key = input.getKey();
for (Integer value : all_elements) {
output.get(tagProcessed).output(KV.of(key, value));
}
}
Note that here we can already group the values and emit just a single KV<Integer, Iterable<Integer>> instead. I just made a for loop instead to avoid changing other steps downstream.
With this, I publish a message such as:
gcloud pubsub topics publish streamdemo --message "[1,2,3,4,5,6,7,8,9,10]"
And where before I got:
INFO: DONE: [4,8]
Now I get:
INFO: DONE: [1,2,3,4,5,6,8,9,10]
Element 7 is not present as is the one that simulates errors.
Tested with DirectRunner and 2.16.0 SDK. Full code here.
Let me know if that works for your use case, keep in mind that I only did some minor tests.
I have a Pub/Sub topic + subscription and want to consume and aggregate the unbounded data from the subscription in a Dataflow. I use a fixed window and write the aggregates to BigQuery.
Reading and writing (without windowing and aggregation) works fine. But when I pipe the data into a fixed window (to count the elements in each window) the window is never triggered. And thus the aggregates are not written.
Here is my word publisher (it uses kinglear.txt from the examples as input file):
public static class AddCurrentTimestampFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
c.outputWithTimestamp(c.element(), new Instant(System.currentTimeMillis()));
}
}
public static class ExtractWordsFn extends DoFn<String, String> {
#ProcessElement public void processElement(ProcessContext c) {
String[] words = c.element().split("[^a-zA-Z']+");
for (String word:words){ if(!word.isEmpty()){ c.output(word); }}
}
}
// main:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
p.apply("ReadLines", TextIO.Read.from(o.getInputFile()))
.apply("Lines2Words", ParDo.of(new ExtractWordsFn()))
.apply("AddTimestampFn", ParDo.of(new AddCurrentTimestampFn()))
.apply("WriteTopic", PubsubIO.Write.topic(o.getTopic()));
p.run();
Here is my windowed word counter:
Pipeline p = Pipeline.create(o); // 'o' are the pipeline options
BigQueryIO.Write.Bound tablePipe = BigQueryIO.Write.to(o.getTable(o))
.withSchema(o.getSchema())
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND);
Window.Bound<String> w = Window
.<String>into(FixedWindows.of(Duration.standardSeconds(1)));
p.apply("ReadTopic", PubsubIO.Read.subscription(o.getSubscription()))
.apply("FixedWindow", w)
.apply("CountWords", Count.<String>perElement())
.apply("CreateRows", ParDo.of(new WordCountToRowFn()))
.apply("WriteRows", tablePipe);
p.run();
The above subscriber will not work, since the window does not seem to trigger using the default trigger. However, if I manually define a trigger the code works and the counts are written to BigQuery.
Window.Bound<String> w = Window.<String>into(FixedWindows.of(Duration.standardSeconds(1)))
.triggering(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(1)))
.withAllowedLateness(Duration.ZERO)
.discardingFiredPanes();
I like to avoid specifying custom triggers if possible.
Questions:
Why does my solution not work with Dataflow's default trigger?
How do I have to change my publisher or subscriber to trigger windows using the default trigger?
How are you determining the trigger never fires?
Your PubSubIO.Write and PubSubIO.Read transforms should both specify a timestamp label using withTimestampLabel, otherwise the timestamps you've added will not be written to PubSub and the publish times will be used.
Either way, the input watermark of the pipeline will be derived from the timestamps of the elements waiting in PubSub. Once all inputs have been processed, it will stay back for a few minutes (in case there was a delay in the publisher) before advancing to real time.
What you are likely seeing is that all the elements are published in the same ~1 second window (since the input file is pretty small). These are all read and processed relatively quickly, but the 1-second window they are put in will not trigger until after the input watermark has advanced, indicating that all data in that 1-second window has been consumed.
This won't happen until several minutes, which may make it look like the trigger isn't working. The trigger you wrote fired after 1 second of processing time, which would fire much earlier, but there is no guarantee all the data has been processed.
Steps to get better behavior from the default trigger:
Use withTimestampLabel on both the write and read pubsub steps.
Have the publisher spread the timestamps out further (eg., run for several minutes and spread the timestamps out across that range)
I am attempting to write a small demo program that has two cuda streams progressing and, governed by events, waiting for each other. So far this program looks like this:
// event.cu
#include <iostream>
#include <cstdio>
#include <cuda_runtime.h>
#include <cuda.h>
using namespace std;
__global__ void k_A1() { printf("\tHi! I am Kernel A1.\n"); }
__global__ void k_B1() { printf("\tHi! I am Kernel B1.\n"); }
__global__ void k_A2() { printf("\tHi! I am Kernel A2.\n"); }
__global__ void k_B2() { printf("\tHi! I am Kernel B2.\n"); }
int main()
{
cudaStream_t streamA, streamB;
cudaEvent_t halfA, halfB;
cudaStreamCreate(&streamA);
cudaStreamCreate(&streamB);
cudaEventCreate(&halfA);
cudaEventCreate(&halfB);
cout << "Here is the plan:" << endl <<
"Stream A: A1, launch 'HalfA', wait for 'HalfB', A2." << endl <<
"Stream B: Wait for 'HalfA', B1, launch 'HalfB', B2." << endl <<
"I would expect: A1,B1, (A2 and B2 running concurrently)." << endl;
k_A1<<<1,1,0,streamA>>>(); // A1!
cudaEventRecord(halfA,streamA); // StreamA triggers halfA!
cudaStreamWaitEvent(streamA,halfB,0); // StreamA waits for halfB.
k_A2<<<1,1,0,streamA>>>(); // A2!
cudaStreamWaitEvent(streamB,halfA,0); // StreamB waits, for halfA.
k_B1<<<1,1,0,streamB>>>(); // B1!
cudaEventRecord(halfB,streamB); // StreamB triggers halfB!
k_B2<<<1,1,0,streamB>>>(); // B2!
cudaEventDestroy(halfB);
cudaEventDestroy(halfA);
cudaStreamDestroy(streamB);
cudaStreamDestroy(streamA);
cout << "All has been started. Synchronize!" << endl;
cudaDeviceSynchronize();
return 0;
}
My grasp of CUDA streams is the following: A stream is a kind of list to which I can add tasks. These tasks are tackled in series. So in my program I can rest assured that streamA would in order
Call kernel k_A1
Trigger halfA
Wait for someone to trigger halfB
Call kernel k_A2
and streamB would
Wait for someone to trigger halfA
Call kernel k_B1
Trigger halfB
Call kernel k_B2
Normally both streams might run asynchronous to each other. However, I would like to block streamB until A1 is done and then block streamA until B1 is done.
This appears not to be as simple. On my Ubuntu with Tesla M2090 (CC 2.0) the output of
nvcc -arch=sm_20 event.cu && ./a.out
is
Here is the plan:
Stream A: A1, launch 'HalfA', wait for 'HalfB', A2.
Stream B: Wait for 'HalfA', B1, launch 'HalfB', B2.
I would expect: A1,B1, (A2 and B2 running concurrently).
All has been started. Synchronize!
Hi! I am Kernel A1.
Hi! I am Kernel A2.
Hi! I am Kernel B1.
Hi! I am Kernel B2.
And I really would have expected B1 to be completed before the cudaEventRecord(halfB,streamB). Nevertheless stream A obviously does not wait for the completion of B1 and so not for the recording of halfB.
What's more: If I altogether delete the cudaEventRecord commands I would expect the program to lock down on the cudaStreamWait commands. But it does not and produces the same output. What am I overlooking here?
I think this is because "cudaStreamWaitEvent(streamA,halfB,0); " was called before "halfB" was recorded (cudaEventRecord(halfB,streamB);). It's likely that the cudaStreamWaitEvent call was searching for the closed "halfB" prior to it; since it was not found, it just quietly moved forward. See the following documentation:
The stream stream will wait only for the completion of the most recent host call to cudaEventRecord() on event. Once this call has returned, any functions (including cudaEventRecord() and cudaEventDestroy()) may be called on event again, and the subsequent calls will not have any effect on stream.
I could not find a solution if you have to do a depth-first coding; however, the following code may lead to what you want:
k_A1<<<1,1,0,streamA>>>(d); // A1!
cudaEventRecord(halfA,streamA); // StreamA triggers halfA!
cudaStreamWaitEvent(streamB,halfA,0); // StreamB waits, for halfA.
k_B1<<<1,1,0,streamB>>>(d); // B1!
cudaEventRecord(halfB,streamB); // StreamB triggers halfB!
cudaStreamWaitEvent(streamA,halfB,0); // StreamA waits for halfB.
k_A2<<<1,1,0,streamA>>>(d); // A2!
k_B2<<<1,1,0,streamB>>>(d); // B2!
which is confirmed by the profiling:
Note that I changed the kernel interfaces.
From the docs:
If cudaEventRecord() has not been called on event, this call acts as if the record has already completed, and so is a functional no-op.
https://www.cs.cmu.edu/afs/cs/academic/class/15668-s11/www/cuda-doc/html/group__CUDART__STREAM_gfe68d207dc965685d92d3f03d77b0876.html#gfe68d207dc965685d92d3f03d77b0876
So we need to sort these lines so that the record is in the program before the eventwait. That is, for the stream of the event wait to be forced to run before the record, the record must be earlier in the code!
Here's the original code:
k_A1<<<1,1,0,streamA>>>(); // A1!
cudaEventRecord(halfA,streamA); // StreamA triggers halfA!
cudaStreamWaitEvent(streamA,halfB,0); // StreamA waits for halfB.
k_A2<<<1,1,0,streamA>>>(); // A2!
cudaStreamWaitEvent(streamB,halfA,0); // StreamB waits, for halfA.
k_B1<<<1,1,0,streamB>>>(); // B1!
cudaEventRecord(halfB,streamB); // StreamB triggers halfB!
k_B2<<<1,1,0,streamB>>>(); // B2!
We see that the record of halfB is called on the second to last line but the wait is called above, on the third line. No good. So we re-order. The first thing on streamB is that wait and our only requirement is that is happen after the record. So that line can move up to be the third line.
Likewise, the k_B1 can follow it directly. And then the cudaEventRecord for halfB can be moved up before the waitevent. Hmm, does this prevent deadlock I wonder?
k_A1<<<1,1,0,streamA>>>(); // A1!
cudaEventRecord(halfA,streamA); // StreamA triggers halfA!
cudaStreamWaitEvent(streamB,halfA,0); // StreamB waits, for halfA.
k_B1<<<1,1,0,streamB>>>(); // B1!
cudaEventRecord(halfB,streamB); // StreamB triggers halfB!
cudaStreamWaitEvent(streamA,halfB,0); // StreamA waits for halfB.
k_A2<<<1,1,0,streamA>>>(); // A2!
k_B2<<<1,1,0,streamB>>>(); // B2!
I am developing MDI application with help of wxErlang. I have a parent frame, implemented as wx_object:
-module(main_frame).
-export([new/0, init/1, handle_call/3, handle_event/2, terminate/2]).
-behaviour(wx_object).
....
And I have a child frame, implemented as wx_object too:
module(child_frame).
-export([new/2, init/1, handle_call/3, handle_event/2, terminate/2]).
-export([save/1]).
-behaviour(wx_object).
% some public API method
save(Frame) ->
wx_object:call(Frame, save).
....
I want to call save/1 for an active child frame from the parent frame. There is my code:
ActiveChild = wxMDIParentFrame:getActiveChild(Frame),
case wx:is_null(ActiveChild) of
false ->
child_frame:save(ActiveChild);
_ ->
ignore
end
This code fails because ActiveChild is #wx_ref{} with state=[], but wx_object:call/2 needs #wx_ref{} where state is set to the pid of the process which we call. What is the right method to do this? I thought only to store a list of all created child frames with its pids in the parent frame and search the pid in this list, but this is ugly.
You can not (currently) get the erlang object/process from
wxMDIParentFrame:getActiveChild(Frame),
You will have to keep the erlang Child objects in your state and the active child,
as well probably keep it updated with events.
There is room for improvment here