When does code in a service worker outside of an event handler run? - service-worker

(I am paraphrasing question asked by Rich Harris in the "Stuff I wish I'd known sooner about service workers" gist.)
If I have code in my service worker that runs outside an event handler, when does it run?
And, closely related to that, what is the difference between putting inside an install handler and putting it outside an event handler entirely?

In general, code that's outside any event handler, in the "top-level" of the service worker's global scope, will run each and every time the service worker thread(/process) is started up. The service worker thread may start (and stop) at arbitrary times, and it's not tied to the lifetime of the web pages it controlled.
(Starting/stopping the service worker thread frequently is a performance/battery optimization, and ensures that, e.g., just because you browse to a page that has registered a service worker, you won't get an extra idle thread spinning in the background.)
The flip side of that is that every time the service worker thread is stopped, any existing global state is destroyed. So while you can make certain optimizations, like storing an open IndexedDB connection in global state in the hopes of sharing it across multiple events, you need to be prepared to re-initialize them if the thread had been killed in between event handler invocations.
Closely related to this question is a misconception I've seen about the install event handler. I have seen some developers use the install handler to initialize global state that they then rely on in other event handlers, like fetch. This is dangerous, and will likely lead to bugs in production. The install handler fires once per version of a service worker, and is normally best used for tasks that are tied to service worker versioning—like caching new or updated resources that are needed by that version. After the install handler has completed successfully, a given version of a service worker will be considered "installed", and the install handler won't be triggered again when the service worker starts up to handle, e.g., a fetch or message event.
So, if there is global state that needs to be initialized prior to handling, e.g., a fetch event, you can do that in the top-level service worker global scope (optionally waiting on a promise to resolve inside the fetch event handler to ensure that any asynchronous operations have completed). Do not rely on the install handler to set up global scope!
Here's an example that illustrates some of these points:
// Assume this code lives in service-worker.js
// This is top-level code, outside of an event handler.
// You can use it to manage global state.
// _db will cache an open IndexedDB connection.
let _db;
const dbPromise = () => {
if (_db) {
return Promise.resolve(_db);
}
// Assume we're using some Promise-friendly IndexedDB wrapper.
// E.g., https://www.npmjs.com/package/idb
return idb.open('my-db', 1, upgradeDB => {
return upgradeDB.createObjectStore('key-val');
}).then(db => {
_db = db;
return db;
});
};
self.addEventListener('install', event => {
// `install` is fired once per version of service-worker.js.
// Do **not** use it to manage global state!
// You can use it to, e.g., cache resources using the Cache Storage API.
});
self.addEventListener('fetch', event => {
event.respondWith(
// Wait on dbPromise to resolve. If _db is already set, because the
// service worker hasn't been killed in between event handlers, the promise
// will resolve right away and the open connection will be reused.
// Otherwise, if the global state was reset, then a new IndexedDB
// connection will be opened.
dbPromise().then(db => {
// Do something with IndexedDB, and eventually return a `Response`.
});
);
});

Related

How to restore runOn Scheduler used in previous operator?

Folks, is it possible to obtain currently used Scheduler within an operator?
The problem that I have is that Mono.fromFuture() is being executed on a native thread (AWS CRT Http Client in my case). As result all subsequent operators are also executed on that thread. And later code wants to obtain class loader context that is obviously null. I realize that I can call .publishOn(originalScheduler) after .fromFuture() but I don't know what scheduler is used to materialize Mono returned by my function.
Is there elegant way to deal with this?
fun myFunction(): Mono<String> {
return Mono.just("example")
.flatMap { value ->
Mono.fromFuture {
// invocation of 3rd party library that executes Future on the thread created in native code.
}
}
.map {
val resource = Thread.currentThread().getContextClassLoader().getResources("META-INF/services/blah_blah");
// NullPointerException because Thread.currentThread().getContextClassLoader() returns NULL
resource.asSequence().first().toString()
}
}
It is not possible, because there's no guarantee that there is a Scheduler at all.
The place where the subscription is made and the data starts flowing could simply be a Thread. There is no mechanism in Java that allows an external actor to submit a task to an arbitrary thread (you have to provide the Runnable at Thread construction).
So no, there's no way of "returning to the previous Scheduler".
Usually, this shouldn't be an issue at all. If your your code is reactive it should also be non-blocking and thus able to "share" whichever thread it currently runs on with other computations.
If your code is blocking, it should off-load the work to a blocking-compatible Scheduler anyway, which you should explicitly chose. Typically: publishOn(Schedulers.boundedElastic()). This is also true for CPU-intensive tasks btw.

Dataflow/Apache Beam: How can I access state from #FinishBundle?

Our pipeline buffers events and does an external fetch (for enrichment) every 500 events. When a timer is fired, these events are then processed when a timer fires. Of course, when you have e.g. 503 events, there will be 3 events that were not enriched.
From experiments we learned that #FinishBundle is always called before the timer. It even seems the result of the bundle in committed before the timer executed (checkpointing?). If we could access the state from #FinishBundle and perform an enrichment on these last events, they would be part of the committed bundle.
I believe this would solve our exactly-once problem: currently the timer also needs to fetch and will do this again upon re-execution. When we could adjust the state in the #FinishBundle the fetch is no longer needed and when the timer re-executes it will start from the state.
Apparently, it is not possible to access the state from the #FinishBundle function, as the following gives errors:
#FinishBundle
public void finishBundle(FinishBundleContext c,
#StateId("buffer") BagState<AwesomeEvent> bufferState) {
LOG.info("--- FINISHBUNDLE CALLED ---");
// TODO: ENRICHMENT
LOG.info("--- FINISHBUNDLE DONE ---");
}
Am I doing something wrong or is there another way of accessing the state from this function?
Also, am I making the correct assessment about the timer behavior?

Does async operation in iOS create a new thread internally, and allocate task to it?

Does async operation in iOS, internally create a new thread, and allocate task to it ?
An async operation is capable to internally create a new thread and allocate task to it. But in order for this to happen you need to run an async operation which creates a new thread and allocates task to it. Or in other words: There is no direct correlation.
I assume that by async you mean something like DispatchQueue.main.async { <#code here#> }. This does not create a new thread as main thread should already be present. How and why does this work can be (if oversimplified) explained with an array of operations and an endless loop which is basically what RunLoop is there for. Imagine the following:
Array<Operations> allOperations;
int main() {
bool continueRunning = true;
for(;continueRunning;) {
allOperations.forEach { $0.run(); }
allOperations.clear();
}
return 0;
}
And when you call something like DispatchQueue.main.async it basically creates a new operation and inserts it into allOperations. The same thread will eventually go into a new loop (within for-loop) and call your operation asynchronously. Again keep in mind that this is all over-simplified just to illustrate the idea behind all of it. You can from this also imagine how for instance timers work; the operation will evaluate if current time is greater then the one of next scheduled execution and if so it will trigger the operation on timer. That is also why timers can not be very precise since they depend on rest of execution and thread may be busy.
A new thread on the other hand may be spawned when you create a new queue DispatchQueue(label: "Will most likely run on a new thread"). When(if) exactly will a thread be made is not something that needs to be fixed. It may vary from implementations and systems being run on. The tool will only guarantee to perform what it is designed for but not how it will do it.
And then there is also Thread class which can generate a new thread. But the deal is same as for previous one; it might internally instantly create a new thread or it might do it later, lazily. All it guarantees is that it will work for it's public interface.
I am not saying that these things change over time, implementation or system they run on. I am only saying that they potentially could and they might have had.

Stream function calls are async in Google Dart?

Why dart calls my function "aFunction" after Step2? If I execute this code this text below in console:
Step2
Step1
My code:
void main()
{
...
stream.listen(aFunction);
print("Step2");
...
}
void aFunction()
{
print("Step1");
}
Thanks for help.
One of the few promises that a Dart Stream makes is that it generates no events in response to a listen call.
The events may come at a later time, but the code calling 'listen' is allowed to continue, and complete, before the first event is fired.
We originally allowed streams to fire immediately on a listen, but when we tried to program with that, it was completely impossible to control in practice.
The same is true for listening on a future, for example with 'then'. The callback will never come immediately.
Events should generally act as if they were fired by the top-level event loop, so the event handler doesn't have to worry if other code is running - other code that might not be reentrant.
That is not always the case in practice. One event handler may trigger other events through a synchronous stream controller, effectively turning one event into anoter. That requires the event handler to know what it is doing. Synchronous controllers are intended for internal use inside, e.g., a stream transformer, and using a synchronous stream controller isn't recommended in general.
So, no, you can't have the listen call immediately trigger the callback.
You can listen to a stream synchronously if you created a StreamController with the sync option enabled. Here is an example to get what you describe:
var controller = new StreamController<String>(sync: true);
var stream = controller.stream.asBroadcastStream();
stream.listen((text) => print(text));
controller.add("Step1");
print("Step2");

Launching multiple async futures in response to events

I would like to launch a fairly expensive operation in response to a user clicking on a canvas element.
mouseDown(MouseEvent e) {
print("entering event handler");
var future = new Future<int>(expensiveFunction);
future.then((int value) => redrawCanvas(value);
print("done event handler");
}
expensiveFunction() {
for(int i = 0; i < 1000000000; i++){
//do something insane here
}
}
redrawCanvas(int value) {
//do stuff here
print("redrawing canvas");
}
My understanding of M4 Dart, is that this future constructor should launch "expensiveFunction" asynchronously, aka on a different thread from the main one. And it does appear this way, as "done event handler" is immediately printed into my output window in the IDE, and then some time later "redrawing canvas" is printed. However, if I click on the element again nothing happens until my "expensiveFunction" is done running from the previous click.
How do I use futures to simply launch an compute intensive function on new thread such that I can have multiple of them queued up in response to multiple clicks, even if the first future is not complete yet?
Thanks.
As mentioned in a different answer, Futures are just a "placeholder for a value that is made available in the future". They don't necessarily imply concurrency.
Dart has a concept of isolates for concurrency. You can spawn an isolate to run some code in a parallel thread or process.
dart2js can compile isolates into Web Workers. A Web Worker can run in a separate thread.
Try something like this:
import 'dart:isolate';
expensiveOperation(SendPort replyTo) {
var result = doExpensiveThing(msg);
replyTo.send(result);
}
main() async {
var receive = new ReceivePort();
var isolate = await Isolate.spawn(expensiveOperation, receive.sendPort);
var result = await receive.first;
print(result);
}
(I haven't tested the above, but something like it should work.)
Event Loop & Event Queue
You should note that Futures are not threads. They do not run concurrently, and in fact, Dart is single-threaded. All Dart code runs in an event loop.
The event loop is a loop that runs as long as the current Dart isolate is alive. When you call main() to start a Dart application, the isolate is created, and it is no longer alive after the main method is completed and all items on the event queue are completed as well.
The event queue is the set of all functions that still need to finish executing. Because Dart is single threaded, all of these functions need to run one at a time. So when one item in the event queue is completed, another one begins. The exact timing and scheduling of the event queue is something that's way more complicated than I can explain myself.
Therefore, asynchronous processing is important to prevent the single thread from being blocked by some long running execution. In a UI, a long process can cause visual jankiness and hinder your app.
Futures
Futures represent a value that will be available sometime in the Future, hence the name. When a Future is created, it is returned immediately, and execution continues.
The callback associated with that Future (in your case, expensiveFunction) is "started" by being added to the event queue. When you return from the current isolate, the callback runs and as soon as it can, the code after then.
Streams
Because your Futures are by definition asynchronous, and you don't know when they return, you want to queue up your callbacks so that they remain in order.
A Stream is an object that emits events that can be subscribed to. When you write canvasElement.onClick.listen(...) you are asking for the onClick Stream of MouseEvents, which you then subscribe to with listen.
You can use Streams to queue up events and register a callback on those events to run the code you'd like.
What to Write
main() {
// Used to add events to a stream.
var controller = new StreamController<Future>();
// Pause when we get an event so that we take one value at a time.
var subscription = controller.stream.listen(
(_) => subscription.pause());
var canvas = new CanvasElement();
canvas.onClick.listen((MouseEvent e) {
print("entering event handler");
var future = new Future<int>(expensiveFunction);
// Resume subscription after our callback is called.
controller.add(future.then(redrawCanvas).then(subscription.resume()));
print("done event handler");
});
}
expensiveFunction() {
for(int i = 0; i < 1000000000; i++){
//do something insane here
}
}
redrawCanvas(int value) {
//do stuff here
print("redrawing canvas");
}
Here we are queuing up our redrawCanvas callbacks by pausing after each mouse click, and then resuming after redrawCanvas has been called.
More Information
See also this great answer to a similar question.
A great place to start reading about Dart's asynchrony is the first part of this article about the dart:io library and this article about the dart:async library.
For more information about Futures, see this article about Futures.
For Streams information, see this article about adding to Streams and this article about creating Streams.

Resources