Aren't PWAs user unfriendly if the service worker is not immediately active? - service-worker

I posted another question as a brute-force solution to this one (Angular: fully install service worker before anything else) but I thought I'd make a separate one to discuss the use case for when a service worker is used as intended.
According to the service worker life cycle (https://developers.google.com/web/fundamentals/primers/service-workers/lifecycle), the SW is installed but it's only active once you then reload the page (you can claim() the page but that's only for calls that happen after the service worker is installed). The reasoning is that if and existing version is updated, the old one and the new one do not mix states and caches. I can agree with that decision.
What I have trouble understanding is why it is not immediately active once it is initially installed. Instead, it requires a page reload unless you explicitly define precaching rules in the SW. If you define caching rules with wildcards, it's not possible to precache those so you need the reload.
Given a single page PWA (like Angular), a user will discover the site and browser around on it but the page will never be reloaded during that session. If they then want to use the site offline later, they need to have refreshed or re-opened the tab at least one other time. That seems like a pretty big pitfall to me.
Am I missing something here?

Your understanding of the service worker lifecycle is correct but I do not think the pitfall you mentioned is as severe as you think it is.
If I understand you correctly, the user experience will only be negatively affected if the user loses connectivity during the initial browsing of the page (before the service worker is active) and is missing an offline asset. If this is truly a scenario you want to account for then that offline asset can be pre-cached in the browser-side javascript. Alternatively, as you mentioned, you can skipWaiting() and claim() to make the service worker active without the user refreshing the page.

Related

New build deployed in a domain (https://example.com) is not getting reflected as the previous build has a service worker running [duplicate]

I'm playing with the service worker API in my computer so I can grasp how can I benefit from it in my real world apps.
I came across a weird situation where I registered a service worker which intercepts fetch event so it can check its cache for requested content before sending a request to the origin.
The problem is that this code has an error which prevented the function from making the request, so my page is left blank; nothing happens.
As the service worker has been registered, the second time I load the page it intercepts the very first request (the one which loads the HTML). Because I have this bug, that fetch event fails, it never requests the HTML and all I see its a blank page.
In this situation, the only way I know to remove the bad service worker script is through chrome://serviceworker-internals/ console.
If this error gets to a live website, which is the best way to solve it?
Thanks!
I wanted to expand on some of the other answers here, and approach this from the point of view of "what strategies can I use when rolling out a service worker to production to ensure that I can make any needed changes"? Those changes might include fixing any minor bugs that you discover in production, or it might (but hopefully doesn't) include neutralizing the service worker due to an insurmountable bug—a so called "kill switch".
For the purposes of this answer, let's assume you call
navigator.serviceWorker.register('service-worker.js');
on your pages, meaning your service worker JavaScript resource is service-worker.js. (See below if you're not sure the exact service worker URL that was used—perhaps because you added a hash or versioning info to the service worker script.)
The question boils down to how you go about resolving the initial issue in your service-worker.js code. If it's a small bug fix, then you can obviously just make the change and redeploy your service-worker.js to your hosting environment. If there's no obvious bug fix, and you don't want to leave your users running the buggy service worker code while you take the time to work out a solution, it's a good idea to keep a simple, no-op service-worker.js handy, like the following:
// A simple, no-op service worker that takes immediate control.
self.addEventListener('install', () => {
// Skip over the "waiting" lifecycle state, to ensure that our
// new service worker is activated immediately, even if there's
// another tab open controlled by our older service worker code.
self.skipWaiting();
});
/*
self.addEventListener('activate', () => {
// Optional: Get a list of all the current open windows/tabs under
// our service worker's control, and force them to reload.
// This can "unbreak" any open windows/tabs as soon as the new
// service worker activates, rather than users having to manually reload.
self.clients.matchAll({type: 'window'}).then(windowClients => {
windowClients.forEach(windowClient => {
windowClient.navigate(windowClient.url);
});
});
});
*/
That should be all your no-op service-worker.js needs to contain. Because there's no fetch handler registered, all navigation and resource requests from controlled pages will end up going directly against the network, effectively giving you the same behavior you'd get without if there were no service worker at all.
Additional steps
It's possible to go further, and forcibly delete everything stored using the Cache Storage API, or to explicitly unregister the service worker entirely. For most common cases, that's probably going to be overkill, and following the above recommendations should be sufficient to get you in a state where your current users get the expected behavior, and you're ready to redeploy updates once you've fixed your bugs. There is some degree of overhead involved with starting up even a no-op service worker, so you can go the route of unregistering the service worker if you have no plans to redeploy meaningful service worker code.
If you're already in a situation in which you're serving service-worker.js with HTTP caching directives giving it a lifetime that's longer than your users can wait for, keep in mind that a Shift + Reload on desktop browsers will force the page to reload outside of service worker control. Not every user will know how to do this, and it's not possible on mobile devices, though. So don't rely on Shift + Reload as a viable rollback plan.
What if you don't know the service worker URL?
The information above assumes that you know what the service worker URL is—service-worker.js, sw.js, or something else that's effectively constant. But what if you included some sort of versioning or hash information in your service worker script, like service-worker.abcd1234.js?
First of all, try to avoid this in the future—it's against best practices. But if you've already deployed a number of versioned service worker URLs already and you need to disable things for all users, regardless of which URL they might have registered, there is a way out.
Every time a browser makes a request for a service worker script, regardless of whether it's an initial registration or an update check, it will set an HTTP request header called Service-Worker:.
Assuming you have full control over your backend HTTP server, you can check incoming requests for the presence of this Service-Worker: header, and always respond with your no-op service worker script response, regardless of what the request URL is.
The specifics of configuring your web server to do this will vary from server to server.
The Clear-Site-Data: response header
A final note: some browsers will automatically clear out specific data and potentially unregister service workers when a special HTTP response header is returned as part of any response: Clear-Site-Data:.
Setting this header can be helpful when trying to recover from a bad service worker deployment, and kill-switch scenarios are included in the feature's specification as an example use case.
It's important to check the browser support story for Clear-Site-Data: before your rely solely on it as a kill-switch. As of July 2019, it's not supported in 100% of the browsers that support service workers, so at the moment, it's safest to use Clear-Site-Data: along with the techniques mentioned above if you're concerned about recovering from a faulty service worker in all browsers.
You can 'unregister' the service worker using javascript.
Here is an example:
if ('serviceWorker' in navigator) {
navigator.serviceWorker.getRegistrations().then(function (registrations) {
//returns installed service workers
if (registrations.length) {
for(let registration of registrations) {
registration.unregister();
}
}
});
}
That's a really nasty situation, that hopefully won't happen to you in production.
In that case, if you don't want to go through the developer tools of the different browsers, chrome://serviceworker-internals/ for blink based browsers, or about:serviceworkers (about:debugging#workers in the future) in Firefox, there are two things that come to my mind:
Use the serviceworker update mechanism. Your user agent will check if there is any change on the worker registered, will fetch it and will go through the activate phase again. So potentially you can change the serviceworker script, fix (purge caches, etc) any weird situation and continue working. The only downside is you will need to wait until the browser updates the worker that could be 1 day.
Add some kind of kill switch to your worker. Having a special url where you can point users to visit that can restore the status of your caches, etc.
I'm not sure if clearing your browser data will remove the worker, so that could be another option.
I haven't tested this, but there is an unregister() and an update() method on the ServiceWorkerRegistration object. you can get this from the navigator.serviceWorker.
navigator.serviceWorker.getRegistration('/').then(function(registration) {
registration.update();
});
update should then immediately check if there is a new serviceworker and if so install it. This bypasses the 24 hour waiting period and will download the serviceworker.js every time this javascript is encountered.
For live situations you need to alter the service worker at byte-level (put a comment on the first line, for instance) and it will be updated in the next 24 hours. You can emulate this with the chrome://serviceworker-internals/ in Chrome by clicking on Update button.
This should work even for situations when the service worker itself got cached as the step 9 of the update algorithm set a flag to bypass the service worker.
We had moved a site from godaddy.com to a regular WordPress install. Client (not us) had a serviceworker file (sw.js) cached into all their browsers which completely messed things up. Our site, a normal WordPress site, has no service workers.
It's like a virus, in that it's on every page, it does not come from our server and there is no way to get rid of it easily.
We made a new empty file called sw.js on the root of the server, then added the following to every page on the site.
<script>
if (navigator && navigator.serviceWorker && navigator.serviceWorker.getRegistration) {
navigator.serviceWorker.getRegistration('/').then(function(registration) {
if (registration) {
registration.update();
registration.unregister();
}
});
}
</script>
In case it helps someone else, I was trying to kill off service workers that were running in browsers that had hit a production site that used to register them.
I solved it by publishing a service-worker.js that contained just this:
self.globalThis.registration.unregister();

Why is self.skipWaiting() and self.clients.claim() not default behaviour for service workers

I'm researching service workers for my thesis. I understand how the lifecycle works, but I'm having trouble understanding the default update behaviour of service workers.
When installing a new service worker, while an old one is installed, the service worker will have to wait to activate. With self.skipWaiting() and self.clients.claim() it is possible to fully activate the service worker and control the pages. I don't get why this is not default behaviour. The main reason I can find is to preserve code and data consistency (https://redfin.engineering/service-workers-break-the-browsers-refresh-button-by-default-here-s-why-56f9417694). With some basic understanding of the lifecycle, shouldn't it be possible to preserve both code and data consistency when a service worker updates or am I missing something? Are there any additional reasons?
Also has this behaviour been different in the past? Have skipWaiting() and clients.claim() been added afterwards?
The default - as it is now - is safer in general and doesn't force everyone to come up with all sorts of solutions.
User loads page with main1.js, SWv1 registers 1 second later, site now fully cached
User loads the page again - this time from cache by SWv1, super fast. New SWv2 registers 1 second later, caches new assets (main1.js is now main2.js), takes control via skipWaiting and clientsClaim
Two things can happen now:
Page has loaded with main1.js and the browser has executed whatever that script said. User has interacted with the page etc. Page is running main1.js which expects to be talking to SWv1 but actually the SW in control is SWv2. The script, main1.js, could be sending messages and trying to interact with the SW in a way that only SWv1 understood but v2 doesn't have any idea about. Now the page breaks because of the mismatch.
SWv1 cached all assets that site v1 needed. Thus if main1.js was to lazyload something etc. when user interacted with the page, browser would get that from the cache. As SWv2 has taken control and cached its idea of the assets (these are now newer assets), when main1.js tries to lazyload something originally cached by SWv1 it's not found. Also, because this is now a new deployment, the asset is not on the HTTP server anymore. It would have been in caches handled by SWv1 but SWv2 doesn't know about it. SWv2 knows about a newer version of that file. Page breaks.
It is important to understand that this might not be the case for every site/SW combination. If you have very little logic in the SW script and the main.js doesn't communite with sw.js too much it is possible to build a combination where skipWaiting and clientsClaim don't cause any problems. You can also code in such a way that if an error happens, you'll show the user a notification to refresh.

Accounting for users that have left website without using onunload

I have a webservice with very limited resources (I will be able to handle about 3 simultaneous users).
When users interact with my website they start a complex process server-side. (This process is the limiting factor, as my server machine will not be able to handle many in parallel, and clients cannot run this on their side.)
My question is how to make sure to end the process for users that leave, for example by closing the window.
I have considered onunload and onbeforeunload, but they are also triggered by links within the website (which I need for users to be able to interact with the process) so that does not seem like an option.
This approach seems problematic according to other questions (see this, for example), but it could work if there were a way to check if the user is still an active user when performing the action triggered by onunload (even if in a different page of the website), but I don't know how to do this.
I have also considered periodically checking the list of active users and cancelling the process for users that have left, but I don't know if this is even possible.
I have zero experience with cookies, but could this be a place to use them? Can the server access the (still living) cookies of disconnected users?
Which sounds like a reasonable approach for this problem?
Cases such as these are generally handled by heartbeats. Have your client send periodic heartbeats (which are essentially pings) to the server notifying that it is still alive and interested in the process's results. And the server automatically kills those processes for which it hasn't received client heartbeat for a configured amount of time.
I have considered onunload and onbeforeunload
You are right- you can't rely on them.
I have zero experience with cookies, but could this be a place to use them?
No. Cookies maintain client-side state that is sent to a server on HTTP calls. So, servers don't manage cookies. Instead, they only look at them to identify state.

Zero downtime/blue-green deployment of Single Page Application (SPA)

Yesterday together with the team we were discussing the possibility of using zero downtime deployments to support our single page application.
While discussing it we identified one edge case for it.
After user loads the page in his browser it cannot be removed from memory until he reloads the page. It means that if user loads the page and starts working with the website (for example starts typing a long article like I am doing now) then he cannot receive an updated version of it until he reloads the page.
We could ignore the fact that user sees old application version in his browser but there 2 points listed below.
In case we introduce a breaking change to HTTP Api that is used to serve spa then the user will not be able to save his article (data loss!) or can receive some other error when performing other backend related action.
When user navigates to a new page without reloading SPA he can receive a template of the next page or of some control that is incompatible with outer old container. It can kead to broken markup or application logic.
We cannot force user to relogin as he can be in the middle of typing his article and it is just a bad UX.
Taking all theses points into account one could propose the following solution:
User 1 loads v1 of the SPA into his browser.
Alongside with auth token the version information is sent to browser (using JWT for example).
We want to deploy v2 version of our application. We spin up the v2 version but do not disable v1.
User 2 loads v2 of SPA into his browser
User 1 goes to the next page in SPA. Load balancer checks the version information in his token and routes the traffic of the user 1 to v1 server.
User 2 gets routed in the same way to v2.
User 1 logs out the app and closes the browser.
User 1 logs in back - this time he receives v2.
After v1 application does not receive any traffic for a long time it gets disposed.
In this approach however it is possible to have multiple versions alive, more than 2 (for example if user stays online for whe whole day or two). It means that we will not be able to migrate the database to the new schema until the last user gets logged out (image how it could work for sites like Facebook). It is not a problem to have multiple versions however, such tools as Docker and Rancher allow us to do it easily.
Also in the step 7. User needs to reload the page or close the browser-otherwise he still will be working with v1 and we cannot force him to the next version.
The question I have is what approach do you use to do zero downtime/blue-green deployment of single page applications?
How do you manage the lifetime of "blue" version of your application when you are switching traffic to "green" version, especially in respect to existing "blue" client applications.
Did you solve these issues, do you know any other solution?
I've been struggling with this problem for quite some time and tried several approaches and one specifically worked really well:
Use hashed names when bundling the SPA (including images, et al)
Use a static asset bucket (e.g.: AWS S3) and upload all assets to it before the deployment process kicks in
Enforce internal guidelines to minimize API contracts to be broken (i.e: fields from an endpoint should only be removed after X releases)
Deploy with usual blue/green strategy
Rationale
Using a bucket with hashed bundles ensures that if a customer gets the old version of the SPA, all of its assets will be available before/during/after any deployment process.
Enforcing internal guidelines to not break API compatibility is sometimes tricky but it comes from the very same principles applied to any public API. Embracing/adapting an API deprecation policy from big players helps when communicating with the team with a concrete example.
One approach you might consider is gradual reloading of the SPA in such a moment, when it is not burdensome (or even unnoticeable) for end user.
Suggested approach:
Colored versions of the system (components providing back-end services, API and front-end) "know" (runtimes are provided with) their "color". Component providing users with front-end application embeds this color information into the SPA. This is then sent (via cookie or custom HTTP header) with every request SPA is making to the backend.
Component that routes API calls (API gateway, load balancer, nginx, HAproxy, custom Zuul-based router etc) is aware of this color information and uses it to direct traffic to infrastructure of proper color.
Additionally there is a public URL (not provided by "colored" infrastructure - for example S3 file provided via CloudFront or other proxy) with latest version color. SPA is checking this version every given period of time (60 or 120 seconds). If version does not match the one SPA was provided when loading then on the major next route change page is reloaded "physically", instead of realizing this navigation in browser only.
You can choose which route changes are verifying this version in such a way that it is least obtrusive to the user (possibly almost unnoticeable).
If you choose some of the routes that are used every day by all users then pretty soon all users will migrate to the latest color. Those who have unused opened browser window for long periods of time (computer hibernated for two weeks?) can be handled by forcing reload after certain period of inactivity.
I hope I managed to make myself sound at last a bit cohesive :-)
Regards,
Wojtek
Not sure why would you go for a complete overhaul of your UI since their is always a learning curve involved.Practically in real world it would be a bad idea to switch over to a new UI immediately. You would allow customers to switch over to the new interface over a period of time and then disable older version after a forewarning. Not worth the effort of having such real time switch. A/B testing could be a way to introduce customers to the new interface and then do an actual rollout.
The technique you're describing is called blue-green deployment; You start with your existing server (blue) and add your updated server (green). All new traffic from that point on is redirected to the green environment. The blue environment is only there for servicing existing http connections and also for an optional "roll back" in case the green environment hits major problems. Eventually the "blue" environment can be retired when it has finished servicing all of its requests.
This technique requires that the two systems be somewhat similar. Database schema for instance may make it inpractical.

How to properly handle asynchronous database replication?

I'm considering using Amazon RDS with read replicas to scale our database.
Some of our controllers in our web application are read/write, some of them are read-only. We already have an automated way for identifying which controllers are read-only, so my first approach would have been to open a connection to the master when requesting a read/write controller, else open a connection to a read replica when requesting a read-only controller.
In theory, that sounds good. But then I stumbled open the replication lag concept, which basically says that a replica can be several seconds behind the master.
Let's imagine the following use case then:
The browser posts to /create-account, which is read/write, thus connecting to the master
The account is created, transaction committed, and the browser gets redirected to /member-area
The browser opens /member-area, which is read-only, thus connecting to a replica. If the replica is even slightly behind the master, the user account might not exist yet on the replica, thus resulting in an error.
How do you realistically use read replicas in your application, to avoid these potential issues?
I worked with application which used pseudo-vertical partitioning. Since only handful of data was time-sensitive the application usually fetched from slaves and from master only in selected cases.
As an example: when the User updated their password application would always ask master for authentication prompt. When changing non-time sensitive data (like User Preferences) it would display success dialog along with information that it might take a while until everything is updated.
Some other ideas which might or might not work depending on environment:
After update compute entity checksum, store it in application cache and when fetching the data always ask for compliance with checksum
Use browser store/cookie for storing delta ensuring User always sees the latest version
Add "up-to-date" flag and invalidate synchronously on every slave node before/after update
Whatever solution you choose keep in mind it's subject of CAP Theorem.
This is a hard problem, and there are lots of potential solutions. One potential solution is to look at what facebook did,
TLDR - read requests get routed to the read only copy, but if you do a write, then for the next 20 seconds, all your reads go to the writeable master.
The other main problem we had to address was that only our master
databases in California could accept write operations. This fact meant
we needed to avoid serving pages that did database writes from
Virginia because each one would have to cross the country to our
master databases in California. Fortunately, our most frequently
accessed pages (home page, profiles, photo pages) don't do any writes
under normal operation. The problem thus boiled down to, when a user
makes a request for a page, how do we decide if it is "safe" to send
to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer.
One of the first servers a user request to Facebook hits is called a
load balancer; this machine's primary responsibility is picking a web
server to handle the request but it also serves a number of other
purposes: protecting against denial of service attacks and
multiplexing user connections to name a few. This load balancer has
the capability to run in Layer 7 mode where it can examine the URI a
user is requesting and make routing decisions based on that
information. This feature meant it was easy to tell the load balancer
about our "safe" pages and it could decide whether to send the request
to Virginia or California based on the page name and the user's
location.
There is another wrinkle to this problem, however. Let's say you go to
editprofile.php to change your hometown. This page isn't marked as
safe so it gets routed to California and you make the change. Then you
go to view your profile and, since it is a safe page, we send you to
Virginia. Because of the replication lag we mentioned earlier,
however, you might not see the change you just made! This experience
is very confusing for a user and also leads to double posting. We got
around this concern by setting a cookie in your browser with the
current time whenever you write something to our databases. The load
balancer also looks for that cookie and, if it notices that you wrote
something within 20 seconds, will unconditionally send you to
California. Then when 20 seconds have passed and we're certain the
data has replicated to Virginia, we'll allow you to go back for safe
pages.

Resources