Service Worker: pitfalls of self.skipWaiting() and self.clients.claim() - service-worker

To immediately activate a service worker after it's installed, I use self.skipWaiting() in the install listener. To immediately take control of a page (without the need for a page navigation, e.g. page load), I use self.clients.claim(). I understand that doing such things means:
Page could first load without it being under the control of a Service Worker, but then be taken over by a Service Worker during its lifespan.
A page could start under the control of version 1 of Service Worker but then be taken over by version 2 during its lifespan.
There are all kinds of warnings online about doing such things, but I don't see the pitfalls. Perhaps one potential problem is if the controlled page does some initial handshake or setup with a Service Worker when it first loads. That obviously will be missed when the new Service Worker activates in the background, but even then, the Service Worker could message its controlling pages to notify them of the change.
It seems to me that for most applications under most scenarios would benefit significantly by using both self.skipWaiting() and self.clients.claim() without any downside. Did I miss something?

The pitfalls of self.skipWaiting() is described really well here (thanks #RobertRowntree for the link):
https://redfin.engineering/how-to-fix-the-refresh-button-when-using-service-workers-a8e27af6df68
As for self.clients.claim(), I still haven't seen a compelling argument against it, but when I do, I'll update my answer.

Related

Difference between clientsClaim and skipWaiting

Im trying to understand the difference between skipWaiting and clientsClaim. In my understanding: calling skipWaiting will cause the new service worker to skip the waiting phase, and become active right away. clientsClaim can then 'claim' any other open tabs as well.
What I gather from documentation online:
skipWaiting skips the waiting phase, and becomes active right away source
clientsClaim immediately start controlling pages source
In every post I find online, I usually always see clientsClaim and skipWaiting used together.
However, I recently found a service worker that only uses clientsClaim, and I'm having a hard time wrapping my head around what actually is the difference between clientsClaim and skipWaiting, and in what scenario do you use clientsClaim but not skipWaiting?
My thinking on this, and this may be where I'm wrong, but this is my understanding of it:
Is that calling clientsClaim, but not skipWaiting is redundant? Considering:
The new service worker will become active when all open pages are closed (because we're not using skipWaiting)
When our new service worker is activated, we call clientsClaim, even though we just closed all open pages to even activate the new service worker. There should be no other pages to control, because we just closed them.
Could someone help me understand?
Read documentation on skipWaiting
Read documentation on clientsClaim
Read about service worker lifecycle by Jake Archibald, and played around with this demo
Read a bunch of stackoverflow posts, offline cookbook, different blog posts, etc.
self.skipWaiting() does exactly what you described:
forces the waiting service worker to become the active service
"Active" in this sense does not mean any currently loaded clients are now talking to that service. It instead means that service is now the service to be used whenever a new client requests it.
This is where Clients.claim() comes in:
When a service worker is initially registered, pages won't use it until they next load.
Without calling claim, any existing clients will still continue to talk to the older service worker until a full page load.
While most of the time it makes sense to use skipWaiting and Clients.claim in conjunction, that is not always the case. If there is a chance of a poor experience for the user due to a service worker not being backwards compatible, Clients.claim should not be called. Instead, the next time a client is refreshed or loaded, it would now have the new service worker without worry of the breaking change.
The difference between skipWaiting() and Clients.claim() in Service Workers
An important concept to understand is that for a service worker to become operational on a page it must be the controller of the page. (You can actually see this property in Navigator.serviceWorker.controller.) To become the controller, the service worker must first be activated, but that's not enough in itself. A page can only be controlled if it has also been requested through a service worker.
Normally, this is the case, particularly if you're just updating a service worker. If, on the other hand, you're registering a service worker for the first time on a page, then the service worker will be installed and activated but it will not become the controller of the page because the page was not requested through a service worker.
You can fix this by calling Clients.claim() somewhere in the activate handler. This simply means that you wont have to refresh the page before you see the effects of the service worker.
There's some question as to how useful this actually is. Jake Archibald, one of the authors of the spec, has this to say about it:
I see a lot of people including clients.claim() as boilerplate, but I rarely do so myself. It only really matters on the very first load, and due to progressive enhancement the page is usually working happily without service worker anyway.
As regarding its use with other tabs, it will again only have any effect if those tabs were not requested through a service worker. It's possible to have a scenario where a user has the same page open in different tabs and has these tabs open for a long period of time, during which the developer introduces a service worker. If the user refreshes one tab but not the other, one tab will have the service worker and the other will not. But this scenario seems somewhat uncommon.
skipWaiting()
A service worker is activated after it is installed, and if there is no other service worker that is currently controlling pages within the scope. In other words, if you have any number of tabs open for a page that is being controlled by the old service worker, then the new service worker will not activate. You can therefore activate the new service worker by closing all open tabs. After this, the old service worker is controlling zero pages, and so the new service worker can become active.
If you don’t want to wait for the old service worker to be killed, you can call skipWaiting(). Normally, this is done within the install event handler. Even if the old service worker is controlling pages, it is killed anyway and this allows the new service worker to be activated.

Nagios: Make sure x out of y services are running

I'm introducing 24/7 monitoring for our systems. To avoid unnecessary pages in the middle of the night I want Nagios to NOT page me, if only one or two of the service checks fail, as this won't have any impact on users: The other servers run the same service and the impact on users is almost zero, so fixing the problem has time until the next day.
But: I want to get paged if too many of the checks fail.
For example: 50 servers run the same service, 2 fail -> I can still sleep.
The service fails on 15 servers -> I get paged because the impact is getting to high.
What I could do is add a lot (!) of notification dependencies that only trigger if numerous hosts are down. The problem: Even though I can specify to get paged if 15 hosts are down, I still have to define exactly which hosts need to be down for this alert to be sent. I rather want to specify that if ANY 15 hosts are down a page is made.
I'd be glad if somebody could help me with that.
Personally I'm using Shinken which has business rules just for that. Shinken is backward compatible with Nagios, so it's easy to drop your nagios configuration into shinken.
It seems there is a similar addon for nagios Nagios Business Process Intelligence Addon, but I'm not having experience with this addon.

Automating Azure VIP Swap

I have an ASP.NET MVC 4 app hosted as an Azure web role. I want to do something that seems like it should be pretty standard: I want to create a function that I can call that initiates a VIP swap and raises and event (or calls a callback) when the VIP Swap operation is done.
Just to add some context to the situation: My website implements a workflow that takes about an hour (or less) to complete. If I want to release a new version of the website code, it's convenient (i.e. much less "backward compatibility" code to write) to first let all of the current users complete the workflow so that the new code doesn't need to deal with data created by the previous version of the code. So a management function in my website would first poke a value into the database that disables new workflows; it would then wait until all current workflows are done; it would then call the "VIP Swap" routine; finally, when the VIP Swap routine signals its completion, it would poke the database value to re-enable new workflows.
I found the Microsoft documentation for how to programmatically initiate a VIP swap here:
http://msdn.microsoft.com/en-us/library/ee460814.aspx
The procedure involves POSTing to a magic URL and including some headers in the POST, then periodically performing a GET to a magic URL and checking the response code.
The more I think about this, the more non-trivial it seems. In addition to the basic complexities of wiring up a background timer and completion notification, I don't know what complexities, if any, I might run into trying to do this stuff in the IIS environment. Can I even perform HTTP operations on a background thread? For that matter, will I run into complications just trying to use any of the half dozen or so different "do things in the background" mechanisms baked into .NET?
Any help or guidance will be greatly appreciated. In particular, I'd be ecstatic if someone could point me at a ready-to-go implementation of this function!
I don't think you will find an easy solution to this as the fabric controller is setup to do some very fancy things without your involvement. Running hour-long workflows on a cloud computing environment, where an instance can be pulled out from underneath you, (with a maximum of 5 minutes from the OnStopping event being called to clean up) requires that you do other work anyway to make sure that all of your tasks complete.
The simple question is "What do you do if an instance goes down when workflows are still running?" Do you restart them or are they lost? If they get lost then you don't care anyway, so killing workflows for an upgrade are equally unimportant. If you re-start them then use that same mechanism to decide whether or not a node is due to be shut down, and distribute the jobs accordingly. This pattern is eerily similar to the Hadoop JobTracker. Don't just run the workflows on any 'ol instance. Submit them to a (job tracker) service that decides what to do. The (job tracker) service can then use the service management API to scale up as many instances as you need running the version that you want, run workflows on the appropriate node, and shut them down when they are no longer needed or are outdated.
Unfortunately this may not be the simple solution that you are looking for, but something in your architecture needs to change, rather than trying to force PaaS to fit with your current approach. Decompose your workloads, create loosely coupled services, design for failure, and a few other cloud/distributed computing practices need to be considered. There is a reason why Hadoop is built the way that it is — and it has a reputation for being able to get work done on a bunch of somewhat unreliable commodity hardware.

How do I prevent another process from terminating my service?

I have a Windows service that performs various "jobs" for my application (send emails, create backups, check for my application updates, provide some services...)
Recently some costumers reported problems between using some Internet banking sites and my application.
In searching for solutions, I found reports about a plugin (ActiveX) installed by the Internet banking Web site.
This ActiveX installs a bizarre service (GbPlugin, from GAS Tecnologia), that kills suspicious applications based in some idiot heuristic, and my service is a victim!
Now I'm trying to "immunize" my service.
Are there some ways to restrict the termination of my service to protect it?
I cannot use the "auto restart" option in the service properties, because I cannot be killed!
Both services are running as LOCALSYSTEM.
Most likely that service runs as LOCALSYSTEM and so can kill anything it likes. So it's extremely unlikely that you can defend against it.
Indeed, a quick websearch throws up some some hits that indicate that the service does indeed run as LOCALSYSTEM.
Your only tenable solution is going to involve the other software. Either compel your users to remove it, or work with its developers to find a way to white-list your program.
Assuming GbPlugin is going through normal SCM procedures to stop services and not just brute-force terminating them, then you have a couple of choices to prevent your service from stopping:
set your service's AllowStop property to False.
in the OnStop event, set the Stopped parameter to False.
Either approach will also prevent you from stopping your own service under normal consitions. To work around that, you could write a separate app that uses the Win32 API ControlService() function to send a custom command to your service. Inside your service, have it override the virtual DoCustomControl() method to look for that command. Have it either reset the AllowStop property back to True, or set a flag somewhere that the OnStop event can look at, then call Controller(SERVICE_CONTROL_STOP) to initiate a normal stop.
Needless to say, this is a bit overkill. If possible, a better option is to simply contact GAS Tecnologia and ask why your service is being flagged by GbPlugin's heuristics and then change that condition in your service, or else ask them to fix GbPlugin to ignore your service.

Best way to run rails with long delays

I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.

Resources