Workbox: No way to re-validate pre-cached items? - offline-caching

I am using Workbox to make a CMS site offline-capable. I have the service worker insalling and pre-caching site pages, assets, and offline page. The problem is, with pre-caching, there appears to be no way to update the cache without modifying the service worker. In my case, it's a CMS, so authors will not be re-building this service worker when updating site content. I assumed I could just use the StaleWhileRevalidate strategy but to my dismay, there is no way to provide a caching strategy to "precacheAndRoute()" or "precache()" and "addRoute()". This seems to be an oversight by the Workbox team, but just to be sure, I'd like any feedback on how one might implement this solution without changing the service worker any time a site page is updated.
Thanks!

Answered my own question. Will post here in case anyone finds themselves in a similar case.
According to https://developers.google.com/web/tools/workbox/modules/workbox-precaching#serving_precached_responses, the following information tipped me off to the solution:
The order in which you call precacheAndRoute() or addRoute() is important. You would normally want to call it early on in your service worker file, before registering any additional routes with registerRoute(). If you did call registerRoute() first, and that route matched an incoming request, whatever strategy you defined in that additional route will be used to respond, instead of the cache-first strategy used by workbox-precaching.
So I dumped "precacheAndRoute()" in favor of just "precache()". Then I did a registerRoute() with the StaleWhileRevalidate strategy matching those assets. The important part was specifying the cacheName as the precache:
new workbox.strategies.StaleWhileRevalidate({
cacheName: workbox.core.cacheNames.precache
})
Pretty simple in hindsight, but the documentation wasn't clear at first on what seemed like something that should be built right into preCacheAndRoute.

Related

Aren't PWAs user unfriendly if the service worker is not immediately active?

I posted another question as a brute-force solution to this one (Angular: fully install service worker before anything else) but I thought I'd make a separate one to discuss the use case for when a service worker is used as intended.
According to the service worker life cycle (https://developers.google.com/web/fundamentals/primers/service-workers/lifecycle), the SW is installed but it's only active once you then reload the page (you can claim() the page but that's only for calls that happen after the service worker is installed). The reasoning is that if and existing version is updated, the old one and the new one do not mix states and caches. I can agree with that decision.
What I have trouble understanding is why it is not immediately active once it is initially installed. Instead, it requires a page reload unless you explicitly define precaching rules in the SW. If you define caching rules with wildcards, it's not possible to precache those so you need the reload.
Given a single page PWA (like Angular), a user will discover the site and browser around on it but the page will never be reloaded during that session. If they then want to use the site offline later, they need to have refreshed or re-opened the tab at least one other time. That seems like a pretty big pitfall to me.
Am I missing something here?
Your understanding of the service worker lifecycle is correct but I do not think the pitfall you mentioned is as severe as you think it is.
If I understand you correctly, the user experience will only be negatively affected if the user loses connectivity during the initial browsing of the page (before the service worker is active) and is missing an offline asset. If this is truly a scenario you want to account for then that offline asset can be pre-cached in the browser-side javascript. Alternatively, as you mentioned, you can skipWaiting() and claim() to make the service worker active without the user refreshing the page.

Is there a canonical pattern for caching something related to a session on the server?

In my Rails app, once per user session, I need to have my server send a request to one of our other services to get some data about the user. I only want to make this request once per session because pinging another service every time the user makes a request will significantly slow down our response time. However, I can't store this information in a cookie client-side. This information has some security implications - if the user has the ability to lie to our server about what this piece of information is, they can gain access to data they're not authorized to see.
So what is the best way to cache or store a piece of data associated with a session on the Rails server?
I'm considering using Rails low-level caching, and I think it might even be correct:
Rails.cache.fetch(session.id, expires_in: 12.hours) do
OtherServiceAPI.get_sensitive_data(user.id)
end
I know that Rails often has one canonical way of doing things, though, so I want to be sure there's not a built-in, officially preferred way to associate a piece of data with a session. This question makes it look like there are potential pitfalls using the approach I'm considering as well, although it looks like those concerns may have been made obsolete in newer versions of Rails.
Is there a canonical pattern for what I'm trying to do? Or is the approach I'm considering idiomatic enough?

Why is self.skipWaiting() and self.clients.claim() not default behaviour for service workers

I'm researching service workers for my thesis. I understand how the lifecycle works, but I'm having trouble understanding the default update behaviour of service workers.
When installing a new service worker, while an old one is installed, the service worker will have to wait to activate. With self.skipWaiting() and self.clients.claim() it is possible to fully activate the service worker and control the pages. I don't get why this is not default behaviour. The main reason I can find is to preserve code and data consistency (https://redfin.engineering/service-workers-break-the-browsers-refresh-button-by-default-here-s-why-56f9417694). With some basic understanding of the lifecycle, shouldn't it be possible to preserve both code and data consistency when a service worker updates or am I missing something? Are there any additional reasons?
Also has this behaviour been different in the past? Have skipWaiting() and clients.claim() been added afterwards?
The default - as it is now - is safer in general and doesn't force everyone to come up with all sorts of solutions.
User loads page with main1.js, SWv1 registers 1 second later, site now fully cached
User loads the page again - this time from cache by SWv1, super fast. New SWv2 registers 1 second later, caches new assets (main1.js is now main2.js), takes control via skipWaiting and clientsClaim
Two things can happen now:
Page has loaded with main1.js and the browser has executed whatever that script said. User has interacted with the page etc. Page is running main1.js which expects to be talking to SWv1 but actually the SW in control is SWv2. The script, main1.js, could be sending messages and trying to interact with the SW in a way that only SWv1 understood but v2 doesn't have any idea about. Now the page breaks because of the mismatch.
SWv1 cached all assets that site v1 needed. Thus if main1.js was to lazyload something etc. when user interacted with the page, browser would get that from the cache. As SWv2 has taken control and cached its idea of the assets (these are now newer assets), when main1.js tries to lazyload something originally cached by SWv1 it's not found. Also, because this is now a new deployment, the asset is not on the HTTP server anymore. It would have been in caches handled by SWv1 but SWv2 doesn't know about it. SWv2 knows about a newer version of that file. Page breaks.
It is important to understand that this might not be the case for every site/SW combination. If you have very little logic in the SW script and the main.js doesn't communite with sw.js too much it is possible to build a combination where skipWaiting and clientsClaim don't cause any problems. You can also code in such a way that if an error happens, you'll show the user a notification to refresh.

Using Workbox to manage caches with several service worker clients

I'm recently implement a service workers on our site with workbox. Due to the structure of our project we're implementing a service worker for each page for instance:
/foo/XXX/
/foo/XYZ/
/foo/XXY/
This is causing that we're creating a service worker for each page.
On the other hand, we're using precaching in our build process in order to precache css and js assets.
I know workbox creates two caches, one for precaching and the other one for the runtime. Becuase we have several service worker our customer have a new cache entry when they visit a new page
workbox-precache-https://www.example.com/foo/XXX-https://www.example.com
workbox-precache-https://www.example.com/foo/XYZ-https://www.example.com
workbox-precache-https://www.example.com/foo/XXY-https://www.example.com
I know workbox provides an option to set the name for the cache.
workbox.core.setCacheNameDetails({
prefix: 'my-app',
suffix: 'v1',
precache: 'custom-precache-name',
runtime: 'custom-runtime-name'
});
My question is, can I use this option to set the cache name as unique ? My approach is that all assets are in the same cache so workbox will be in charge to delete duplicated and manage the cache. Does it make sense?
Thanks a lot
If you call workbox.core.setCacheNameDetails({suffix: 'my-suffix'}) at the very start of your service worker script, and you do that for each service worker registered on your origin, that would be enough to have all of the service workers use a common cache for their precached assets. (Normally the scope of the currently service worker is used as the suffix, to prevent collisions and ensure that each service worker got its own cache, so you'd be overriding that behavior.)
But... I'd be hesitant to actually do this, or at least to test thoroughly before you do, as you're opening yourself up to possible issues. Some things that I'd worry about:
Normally, the install and activate service worker lifecycle events are used to trigger downloading new assets (install) and deleting out of date assets (activate). The activate step will, by default (unless you're using skipWaiting) not fire until after all tabs with active clients are closed, to ensure that nothing is deleted which is still being used by a tab. If you have multiple service workers, each with their own scope and their own lifecycle events, managing the same cache using precaching, then one service worker's activate event might fire while a tab is open still controlled by a different service worker. This could cause entries to be deleted from the precache when they still might be used by that second tab.
I'd be worried about any relative URLs in your precache manifest, as each of those relative URLs would be resolved using the location of the current service worker as the base. If each of the paths of your site have different URL structures, or if /foo/XXX/app.js is fundamentally different than /foo/XYZ/app.js, then an entry of ./app.js in a precache manifest will end up being pretty dangerous if you share a single cache.
What I'd recommend as an alternative, if you really can't go with a single, higher-level service worker, is not to force all the precached assets into a single cache but instead maintain separate, potentially smaller precaches for each service worker, and then use runtime caching with a common cacheName parameter to share the resources that you know are common. I think that's much less likely to be error prone.

Google script origin request url

I'm developing a Google Sheets add-on. The add-on calls an API. In the API configuration, a url like https://longString-script.googleusercontent.com had to be added to the list of urls allowed to make requests from another domain.
Today, I noticed that this url changed to https://sameLongString-0lu-script.googleusercontent.com.
The url changed about 3 months after development start.
I'm wondering what makes the url to change because it also means a change in configuration in our back-end every time.
EDIT: Thanks for both your responses so far. Helped me understand better how this works but I still don't know if/when/how/why the url is going to change.
Quick update, the changing part of the url was "-1lu" for another user today (but not for me when I was testing). It's quite annoying since we can't use wildcards in the google dev console redirect uri field. Am I supposed to paste a lot of "-xlu" uris with x from 1 to like 10 so I don't have to touch this for a while?
For people coming across this now, we've also just encountered this issue while developing a Google Add-on. We've needed to add multiple origin urls to our oauth client for sign-in, following the longString-#lu-script.googleusercontent.com pattern mentioned by OP.
This is annoying as each url has to be entered separately in the authorized urls field (subdomain or wildcard matching isn't allowed). Also this is pretty fragile since it breaks if Google changes the urls they're hosting our add-on from. Furthermore I wasn't able to find any documentation from Google confirming that these are the script origins.
URLs are managed by the host in various ways. At the most basic level, when you build a web server you decide what to call it and what to call any pages on it. Google and other large content providers with farms of servers and redundant data centers and everything are going to manage it a bit differently, but for your purposes, it will be effectively the same in that ... you need to ask them since they are the hosting provider of your cloud content.
Something that MIGHT be related is that Google rolled out some changes recently dealing with the googleusercontent.com domain and picassa images (or at least was scheduled to do so.) So the google support forums will be the way to go with this question for the freshest answers since the cause of a URL change is usually going to be specific to that moment in time and not something that you necessarily need to worry about changing repeatedly. But again, they are going to need to confirm that it was something related to the recent planned changes... or not. :-)
When you find something out you can update this question in case it is of use to others. Especially, if they tell you that it wasn't a one time thing dealing with a change on their end.
This is more likely related to Changing origin in Same-origin Policy. As discussed:
A page may change its own origin with some limitations. A script can set the value of document.domain to its current domain or a superdomain of its current domain. If it sets it to a superdomain of its current domain, the shorter domain is used for subsequent origin checks.
For example, assume a script in the document at http://store.company.com/dir/other.html executes the following statement:
document.domain = "company.com";
After that statement executes, the page can pass the origin check with http://company.com/dir/page.html
So, as noted:
When using document.domain to allow a subdomain to access its parent securely, you need to set document.domain to the same value in both the parent domain and the subdomain. This is necessary even if doing so is simply setting the parent domain back to its original value. Failure to do this may result in permission errors.

Resources