Is there /How can I get guarantee on the order of arrival for the messages sent to a mailboxprocessor
That is, on a thread if I do
agent.post(msg1)
agent.post(msg2)
How can I be sure that, in the treatment loop for the agent, the messages will be received in order ?
They are. The implementation of Post is as you might guess, it just adds an item to the queue (on the current thread, under a lock), and posts work to notify any waiting agent to wake up and process it. So if you call Post twice on the same thread, one after another, the messages get into the queue in that order.
You can also use inbox.Scan(function _ -> None // return an Option) to find the messages if you have some way of detecting order. Of course, this comes at a price to performance, so leaving the queue alone is the best idea.
Related
I am working on test program which subscribes to one topic published by main program. There is only single message on this topic published by main program. Now it might happen that my subscriber is not alive when publisher is publishing a message and it will get lost. One way to avoid it is put while inside main program till numsubcribers are not zero. but I cannot put while inside my main program. How do i achieve it?
So since you say that there is only a single message send by the main program, this is probably a job for the latch argument when you allocate the publisher:
latch [optional]
Enables "latching" on a connection. When a connection is latched, the last message published is saved and automatically sent to any future subscribers that connect. This is useful for slow-changing to static data like a map. Note that if there are multiple publishers on the same topic, instantiated in the same node, then only the last published message from that node will be sent, as opposed to the last published message from each publisher on that single topic.
Just give your publisher the argument as described here, and ROS handles the message passing to your subscriber when it comes up:
...
bool latch = true;
ros::Publisher advertise(topic, latch)
...
Problem Context
I am trying to generate a total (linear) order of event items per key from a real-time stream where the order is event time (derived from the event payload).
Approach
I had attempted to implement this using streaming as follows:
1) Set up a non overlapping sequential windows, e.g. duration 5 minutes
2) Establish an allowed lateness - it is fine to discard late events
3) Set accumulation mode to retain all fired panes
4) Use the "AfterwaterMark" trigger
5) When handling a triggered pane, only consider the pane if it is the final one
6) Use GroupBy.perKey to ensure all events in this window for this key will be processed as a unit on a single resource
While this approach ensures linear order for each key within a given window, it does not make that guarantee across multiple windows, e.g. there could be a window of events for the key which occurs after that is being processed at the same time as the earlier window, this could easily happen if the first window failed and had to be retried.
I'm considering adapting this approach where the realtime stream can first be processed so that it partitions the events by key and writes them to files named by their window range.
Due to the parallel nature of beam processing, these files will also be generated out of order.
A single process coordinator could then submit these files sequentially to a batch pipeline - only submitting the next one when it has received the previous file and that downstream processing of it has completed successfully.
The problem is that Apache Beam will only fire a pane if there was at least one time element in that time window. Thus if there are gaps in events then there could be gaps in the files that are generated - i.e. missing files. The problem with having missing files is that the coordinating batch processor cannot make the distinction between knowing whether the time window has passed with no data or if there has been a failure in which case it cannot proceed until the file finally arrives.
One way to force the event windows to trigger might be to somehow add dummy events to the stream for each partition and time window. However, this is tricky to do...if there are large gaps in the time sequence then if these dummy events occur surrounded by events much later then they will be discarded as being late.
Are there other approaches to ensuring there is a trigger for every possible event window, even if that results in outputting empty files?
Is generating a total ordering by key from a realtime stream a tractable problem with Apache Beam? Is there another approach I should be considering?
Depending on your definition of tractable, it is certainly possible to totally order a stream per key by event timestamp in Apache Beam.
Here are the considerations behind the design:
Apache Beam does not guarantee in-order transport, so there is no use within a pipeline. So I will assume you are doing this so you can write to an external system with only the capability to handle things if they come in order.
If an event has timestamp t, you can never be certain no earlier event will arrive unless you wait until t is droppable.
So here's how we'll do it:
We'll write a ParDo that uses state and timers (blog post still under review) in the global window. This makes it a per-key workflow.
We'll buffer elements in state when they arrive. So your allowed lateness affects how efficient of a data structure you need. What you need is a heap to peek and pop the minimum timestamp and element; there's no built-in heap state so I'll just write it as a ValueState.
We'll set a event time timer to receive a call back when an element's timestamp can no longer be contradicted.
I'm going to assume a custom EventHeap data structure for brevity. In practice, you'd want to break this up into multiple state cells to minimize the data transfered. A heap might be a reasonable addition to primitive types of state.
I will also assume that all the coders we need are already registered and focus on the state and timers logic.
new DoFn<KV<K, Event>, Void>() {
#StateId("heap")
private final StateSpec<ValueState<EventHeap>> heapSpec = StateSpecs.value();
#TimerId("next")
private final TimerSpec nextTimerSpec = TimerSpec.timer(TimeDomain.EVENT_TIME);
#ProcessElement
public void process(
ProcessContext ctx,
#StateId("heap") ValueState<EventHeap> heapState,
#TimerId("next") Timer nextTimer) {
EventHeap heap = firstNonNull(
heapState.read(),
EventHeap.createForKey(ctx.element().getKey()));
heap.add(ctx.element().getValue());
// When the watermark reaches this time, no more elements
// can show up that have earlier timestamps
nextTimer.set(heap.nextTimestamp().plus(allowedLateness);
}
#OnTimer("next")
public void onNextTimestamp(
OnTimerContext ctx,
#StateId("heap") ValueState<EventHeap> heapState,
#TimerId("next") Timer nextTimer) {
EventHeap heap = heapState.read();
// If the timer at time t was delivered the watermark must
// be strictly greater than t
while (!heap.nextTimestamp().isAfter(ctx.timestamp())) {
writeToExternalSystem(heap.pop());
}
nextTimer.set(heap.nextTimestamp().plus(allowedLateness);
}
}
This should hopefully get you started on the way towards whatever your underlying use case is.
I am new in relay and saw this on getCollisionKey on treasurehunt tutorial:
getCollisionKey() {
return `check_${this.props.game.id}`;
}
In the docs it states - Implement this method to return a collision key. Relay will send any mutations having the same collision key to the server serially and in-order.
Please help me understand what is getCollisionKey. Would really appreciate.
collisionKey is an identifier to help know when mutations needs to be executed one after the other or when they can be parallelised.
Why we need this is mostly because of network inconsistencies.
Take for example a mutation LikeOrUnlikePost. This mutation likes or unlikes the post depending on if you already like it or not.
Suppose you like the post, then 1s after you decide to unlike.
But the first mutation fails, so it isn't sent to your server, so only one LikeOrUnlikePost mutation is sent.
The result is that you think you unliked the post (you clicked twice), but in fact you only liked it (only one mutation succeed).
This is what collisionKey is for. It tells Relay to queue any mutations which have the same collision key.
In the case above, what would happen is the second mutation would get queued, and would never get executed as the first one fails.
I want to model a queue with vacations. When the queue is empty, the server will have a period of vacation with certain distribution.(I can use use gate to block the server ) So I need to get the data of the number of entity in queue block. Could you please tell me how to do that?
Many thanks.
The "number of entities in the Queue" can be found in the 'Statistics' tab of the Queue's properties.
Enabling it (clicking it's checkbox) will enable the signal of interest on the block (#n) that can be connected to other Simulink blocks.
Connect the #n signal to "compare to constant" block to create a boolean signal that indicates if the queue is / is not empty.
So the task is the following:
1)I have a track ID, I need to ask the server for all the track data
2)parse response (here I also have an album ID)
3)now I have an album ID, I need to ask the server for all the album data
4)parse response (here I also have an artist ID)
5)now I have an artist ID, I need to ask the server for all the artist data
I wonder what is the right way to do this with gcd. 3 dispatch_sync-s inside dispatch_async?
I want all this to be one operation, run in the background, so at first I thought about NSOperation, but all callbacks, parsing, saving to core data need to happen on background thread, so I'd have to create a separate run loop for callbacks to make sure it will not be killed before I get a response and will not block ui.
so the question is how should I use gcd here, or is it better to go with nsoperation and a runloop thread for callbacks? thanks
I would suggest using NSOperation and callbacks executed on the main thread.
If you think about it, your workflow is pretty sequential: 1 -> 3 -> 5; the parsing steps (2 and 4) are not presumably that expensive so that you want to execute them on a separate thread (I guess they are not expensive at all and you can disregard parsing time compared to waiting time for network communication).
Furthermore, if you use a communication framework like AFNetworking (or even NSURLConnection + blocks) your workflow will be pretty easy to implement:
retrieve track data
in "retrieve track data" response handler, get album id, then send new request for "album data";
in "retrieve album data" response handler, get artist id, and so on...