Dataflow/Apache Beam: How can I access state from #FinishBundle? - google-cloud-dataflow

Our pipeline buffers events and does an external fetch (for enrichment) every 500 events. When a timer is fired, these events are then processed when a timer fires. Of course, when you have e.g. 503 events, there will be 3 events that were not enriched.
From experiments we learned that #FinishBundle is always called before the timer. It even seems the result of the bundle in committed before the timer executed (checkpointing?). If we could access the state from #FinishBundle and perform an enrichment on these last events, they would be part of the committed bundle.
I believe this would solve our exactly-once problem: currently the timer also needs to fetch and will do this again upon re-execution. When we could adjust the state in the #FinishBundle the fetch is no longer needed and when the timer re-executes it will start from the state.
Apparently, it is not possible to access the state from the #FinishBundle function, as the following gives errors:
#FinishBundle
public void finishBundle(FinishBundleContext c,
#StateId("buffer") BagState<AwesomeEvent> bufferState) {
LOG.info("--- FINISHBUNDLE CALLED ---");
// TODO: ENRICHMENT
LOG.info("--- FINISHBUNDLE DONE ---");
}
Am I doing something wrong or is there another way of accessing the state from this function?
Also, am I making the correct assessment about the timer behavior?

Related

Sequence operations

I have a little card playing app. When the computer is playing, some functions are being fired after some time to mimic a real player, like so:
self.operation.addOperation {
DispatchQueue.main.asyncAfter(deadline: DispatchTime.now()+2) {
self.passTurn()
}
}
Everything works fine, but when I want to cancel the game and start a new one, the app crashes if I do it within two seconds before the self.passTurn() function fires.
Ideally I would like to have some sort of pause between the different operations. The above mentioned operation gets released from the queue immediately after it fires the delayed function, so cancelling or suspends the operation does not work.
Is it possible to somehow retain the operation in the queue for two seconds and release it afterwards when the actual function is done?
Thanks!

When does code in a service worker outside of an event handler run?

(I am paraphrasing question asked by Rich Harris in the "Stuff I wish I'd known sooner about service workers" gist.)
If I have code in my service worker that runs outside an event handler, when does it run?
And, closely related to that, what is the difference between putting inside an install handler and putting it outside an event handler entirely?
In general, code that's outside any event handler, in the "top-level" of the service worker's global scope, will run each and every time the service worker thread(/process) is started up. The service worker thread may start (and stop) at arbitrary times, and it's not tied to the lifetime of the web pages it controlled.
(Starting/stopping the service worker thread frequently is a performance/battery optimization, and ensures that, e.g., just because you browse to a page that has registered a service worker, you won't get an extra idle thread spinning in the background.)
The flip side of that is that every time the service worker thread is stopped, any existing global state is destroyed. So while you can make certain optimizations, like storing an open IndexedDB connection in global state in the hopes of sharing it across multiple events, you need to be prepared to re-initialize them if the thread had been killed in between event handler invocations.
Closely related to this question is a misconception I've seen about the install event handler. I have seen some developers use the install handler to initialize global state that they then rely on in other event handlers, like fetch. This is dangerous, and will likely lead to bugs in production. The install handler fires once per version of a service worker, and is normally best used for tasks that are tied to service worker versioning—like caching new or updated resources that are needed by that version. After the install handler has completed successfully, a given version of a service worker will be considered "installed", and the install handler won't be triggered again when the service worker starts up to handle, e.g., a fetch or message event.
So, if there is global state that needs to be initialized prior to handling, e.g., a fetch event, you can do that in the top-level service worker global scope (optionally waiting on a promise to resolve inside the fetch event handler to ensure that any asynchronous operations have completed). Do not rely on the install handler to set up global scope!
Here's an example that illustrates some of these points:
// Assume this code lives in service-worker.js
// This is top-level code, outside of an event handler.
// You can use it to manage global state.
// _db will cache an open IndexedDB connection.
let _db;
const dbPromise = () => {
if (_db) {
return Promise.resolve(_db);
}
// Assume we're using some Promise-friendly IndexedDB wrapper.
// E.g., https://www.npmjs.com/package/idb
return idb.open('my-db', 1, upgradeDB => {
return upgradeDB.createObjectStore('key-val');
}).then(db => {
_db = db;
return db;
});
};
self.addEventListener('install', event => {
// `install` is fired once per version of service-worker.js.
// Do **not** use it to manage global state!
// You can use it to, e.g., cache resources using the Cache Storage API.
});
self.addEventListener('fetch', event => {
event.respondWith(
// Wait on dbPromise to resolve. If _db is already set, because the
// service worker hasn't been killed in between event handlers, the promise
// will resolve right away and the open connection will be reused.
// Otherwise, if the global state was reset, then a new IndexedDB
// connection will be opened.
dbPromise().then(db => {
// Do something with IndexedDB, and eventually return a `Response`.
});
);
});

Usage of FreeRTOS xQueueSelectFromSet and xQueueReceive

I am using FreeRTOS v8.2.3 on a PIC32 micro controller. I have a case where I need to post the following 3 events to 3 corresponding queues from an ISR, in order to unblock a task awaiting one of these events at a time -
a) SETUP packet arrival
b) Transfer completed event 1
c) Transfer completed event 2
My exection sequence and requirement are as follows:
Case 1 (execution is blocked for an event at point_1):
As SETUP arrives while waiting at point_1 of execution -
i) the waiting task should be unblocked
ii)Setup received from queue and processed
Some code is processed and reaches point_2
Case 2 (execution is blocked for an event at point_2):
If any one of SETUP or transfer complete events occur at point_2 -
i) unblock the wait
ii) receive transfer_complete_1 or transfer_complete_2 event from queue to carry out some additional transfers and loop at point_2
iii)if it was a Setup queue event, do not receive, but go to point_1
The code does not seem to work when I try to use xQueueReceive and xQueueSelectFromSet on the Setup queue even when one of them is used at point_1 and the other used at point_2.
But seems to work fine if I use xQueueSelectFromSet at both the places and verify the queuset member handle that caused the event to proceed further.
Given the requirement above, the problem with using xQueueSelectFromSet at both the places is that
- the xQueueSelectFromSet call will be placed back to back, first on a Setup event at point_2 and then immediately on point_1 which is not intentional
- the xQueueSelectFromSet call at point_1 is also not desired
Hence can anyone please explain whether and how to use both a queueset and queuereceive on the same queue? If not possible how do we typically implement the above requirement in FreeRTOS?
This is a duplicate of a question asked on the FreeRTOS support forum, so below is a duplicate of the answer I gave there:
I don't fully understand your usage scenario, but some points which may help.
1) If a queue is a member of a queue set, then the queue can only be read after its handle has been returned from the queue set. Further, if a queue's handle is returned from a queue set then the item must be read from the queue. If either of these requirements are not met then the state of the queue set will not match that of the queues in the set.
2) If the same task is reading from the multiple queues then it is probably not necessary to use a queue set at all. See the "alternatives to using queue sets" section on the following page: http://www.freertos.org/Pend-on-multiple-rtos-objects.html

Stream function calls are async in Google Dart?

Why dart calls my function "aFunction" after Step2? If I execute this code this text below in console:
Step2
Step1
My code:
void main()
{
...
stream.listen(aFunction);
print("Step2");
...
}
void aFunction()
{
print("Step1");
}
Thanks for help.
One of the few promises that a Dart Stream makes is that it generates no events in response to a listen call.
The events may come at a later time, but the code calling 'listen' is allowed to continue, and complete, before the first event is fired.
We originally allowed streams to fire immediately on a listen, but when we tried to program with that, it was completely impossible to control in practice.
The same is true for listening on a future, for example with 'then'. The callback will never come immediately.
Events should generally act as if they were fired by the top-level event loop, so the event handler doesn't have to worry if other code is running - other code that might not be reentrant.
That is not always the case in practice. One event handler may trigger other events through a synchronous stream controller, effectively turning one event into anoter. That requires the event handler to know what it is doing. Synchronous controllers are intended for internal use inside, e.g., a stream transformer, and using a synchronous stream controller isn't recommended in general.
So, no, you can't have the listen call immediately trigger the callback.
You can listen to a stream synchronously if you created a StreamController with the sync option enabled. Here is an example to get what you describe:
var controller = new StreamController<String>(sync: true);
var stream = controller.stream.asBroadcastStream();
stream.listen((text) => print(text));
controller.add("Step1");
print("Step2");

Difference between pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self())

When pthread_exit(PTHREAD_CANCELED) is called I have expected behavior (stack unwinding, destructors calls) but the call to pthread_cancel(pthread_self()) just terminated the thread.
Why pthread_exit(PTHREAD_CANCELED) and pthread_cancel(pthread_self()) differ significantly and the thread memory is not released in the later case?
The background is as follows:
The calls are made from a signal handler and reasoning behind this strange approach is to cancel a thread waiting for the external library semop() to complete (looping around on EINTR I suppose)
I have noticed that calling pthread_cancel from other thread does not work (as if semop was not a cancellation point) but signalling the thread and then calling pthread_exit works but calls the destructor within a signal handler.
pthread_cancel could postpone the action to the next cancellation point.
In terms of thread specific clean-up behaviour there should be no difference between cancelling a thread via pthread_cancel() and exiting a thread via pthread_exit().
POSIX says:
[...] When the cancellation is acted on, the cancellation clean-up handlers for thread shall be called. When the last cancellation clean-up handler returns, the thread-specific data destructor functions shall be called for thread. When the last destructor function returns, thread shall be terminated.
From Linux's man pthread_cancel:
When a cancellation requested is acted on, the following steps occur for thread (in this order):
Cancellation clean-up handlers are popped (in the reverse of the order in which they were pushed) and called. (See pthread_cleanup_push(3).)
Thread-specific data destructors are called, in an unspecified order. (See pthread_key_create(3).)
The thread is terminated. (See pthread_exit(3).)
Referring the strategy to introduce a cancellation point by signalling a thread, I have my doubts this were the cleanest way.
As many system calls return on receiving a signal while setting errno to EINTR, it would be easy to catch this case and simply let the thread end itself cleanly under this condition via pthread_exit().
Some pseudo code:
while (some condition)
{
if (-1 == semop(...))
{ /* getting here on error or signal reception */
if (EINTR == errno)
{ /* getting here on signal reception */
pthread_exit(...);
}
}
}
Turned out that there is no difference.
However some interesting side effects took place.
Operations on std::iostream especially cerr/cout include cancellation points. When the underlying operation is canceled the stream is marked as not good. So you will get no output from any other thread if only one has discovered cancellation on an attempt to print.
So play with pthread_setcancelstate() and pthread_testcancel() or just call cerr.clear() when needed.
Applies to C++ streams only, stderr,stdin seems not be affected.
First of all, there are two things associated to thread which will tell what to do when you call pthread_cancel().
1. pthread_setcancelstate
2. pthread_setcanceltype
first function will tell whether that particular thread can be cancelled or not, and the second function tells when and how that thread should be cancelled, for example, should that thread be terminated as soon as you send cancellation request or it need to wait till that thread reaches some milestone before getting terminated.
when you call pthread_cancel(), thread wont be terminated directly, above two actions will be performed, i.e., checking whether that thread can be cancelled or not, and if yes, when to cancel.
if you disable cancel state, then pthread_cancel() can't terminate that thread, but the cancellation request will stay in a queue waiting for that thread to become cancellable, i.e., at some point of time if you are enabling cancel state, then your cancel request will start working on terminating that thread
whereas if you use pthread_exit(), then the thread will be terminated irrespective to the cancel state and cancel type of that particular thread.
*this is one of the differences between pthread_exit() and pthread_cancel(), there can be few more.

Resources