Save on demand urls to cache

Save on demand urls to cache - service-worker

I'm learning Workbox and I want to add some articles URLs to cache for X amount of days and I don't know how to do it.
I can handle URLs that I know using precacheAndRoute.
Example:
precacheAndRoute([
{url: '/index.html', revision: '...'},
{url: '/contact.html', revision: '...'},
])
Now, I want to add some URLs that I don't know the path to cache on demand. This's because my project is a blog and each post has his own path.
My proposed scenario is:
A user enters the article, and that article is cached for 30 days, so you can view offline later.

What you're after is called runtime caching. It works as you describe: content is cached as the user navigates through the website. Afterwards the content is available for offline viewing.
Runtime caching maybe be implemented with different strategies. They can eg. accept data only from the cache, from cache or network depending on speed, first cache and update in the background etc. Multiple different strategies which may even be manually configured to fit your needs.
Reading: https://developers.google.com/web/tools/workbox/modules/workbox-strategies#what_are_workbox_strategies, https://developers.google.com/web/fundamentals/instant-and-offline/offline-cookbook, https://web.dev/runtime-caching-with-workbox/
Advice: before implementing anything READ A LOT. That way you can grasp the concepts before you try anything. It might also be that you find something you never thought about in the beginning.

Related

Cache-first Service Worker: how to bypass cache on updated assets?

Here is the scenario:
You have a site that currently cached via a SW. You deploy a new version that includes an updated SW with a cache busting version. The company then announces the new features. People visit the site, however, even though the SW busts it still serves up the previous cache while updating its cache in the background. So visitors that come for the new features don't see them.
Is this the expected experience with ServiceWorkers? What are the recommended strategies to get around this?

It's the expected behavior whenever you serve resources with a cache-first strategy, yes.
There are two options:
Don't use a cache-first strategy. Unfortunately, you lose out on most of the performance benefits of service workers if you use a network-first strategy. I wouldn't recommend going network-first if you can help it.
Adopt the UX pattern of displaying a "Reload for the latest updates" toast message on the screen letting the user know that the cached content has been refreshed, and allowing them to take action to see the latest content. This is, I think, the best approach. If you're using a service worker which gets updated whenever your cached content changes (e.g., one generated by sw-precache), then you can detect these updates by listening for specific service worker controller events, and use those to trigger the message. (Here's an example.)

Storage of user data

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.

Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.

Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.

With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.

I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/

The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

Optimising Rails dynamic JSON CPU/memory intensive operations (caching?)

We've built a dynamic questionnaire with a Angular front-end and RoR backend. Since there are a lot of dynamic parts to handle, it was impossible to utilise ActionView or jbuilder cache helpers. With each Questionnaire request, there are quite a lot of queries to be done, such as checking validity of answers, checking dependencies, etc. Is there a recommended strategy to cache dynamic JSON responses?
To give an idea..
controller code:
def advance
# Decrypt and parse parameters
request = JSON.parse(decrypt(params[:request]))
# Process passed parameters
if request.key?('section_index')
#result_set.start_section(request['section_index'].to_i)
elsif request.key?('question_id')
if valid_answer?(request['question_id'], request['answer_id'])
#result_set.add_answer(request['question_id'],
request['answer_id'],
request['started_at'],
request['completed_at'])
else
return invalid_answer
end
end
render_item(#result_set.next_item)
end
The next_item could be a question or section, but progress indicator data and possibly a previously given answer (navigation is possible) are returned as well. Also, data is sent encrypted from and to the front-end.
We've also built an admin area with an Angular front-end. In this area results from the questionnaire can be viewed and compared. Quite some queries are being done to find subquestions, comparable questions etc. Which we found hard to cache. After clicking around with multiple simultaneous users, you could fill up the servers memory.
The app is deployed on Passenger and we've fine-tuned the config based on the server configuration. The results are stored in a Postgres database.
TLDR: In production, we found out memory usage becomes an issue. Some optimisations to queries (includes specifically) are possible, but is there a recommended strategy for caching dynamic JSON responses?

Without much detail as to how you are storing and retrieving your data, it is a bit tough. But what it sounds like you are saying is that your next_item method is CPU and memory intensive to try to find the next item. Is that correct? Assuming that, you might want to take a look at a Linked List. Each node (polymorphic) would have a link to the next node. You could implement it as a doubly linked list if you needed to step forward and backwards.

How often does the data change? If you have can cache big parts of it and you'll find a trigger attribute (e.g. updated_at) that you can do fragment caching in the view. Or even better to HTTP caching in the controller. You can mix both.
It's a bit complex. Please have a look at http://www.xyzpub.com/en/ruby-on-rails/4.0/caching.html

Basic database (MongoDB) performance question

I'm building a web app for bookmark storage with a directory system.
I've already got these collections set up:
Path(s)
---> Directories (embedded documents)
---> Links (embedded documents)
User(s)
So performance wise, should I:
- add the user id to the created path
- embed the whole Paths collection into the specific user
I want to pick option 2, but yeah, I dunno...
EDIT:
I was also thinking about making the whole interface ajaxified. So, that means I'll load the directories and links from a specific path (from the logged in user) through ajax. That way, it's faster and I don't have to touch the user collection. Maybe that changes things?
Like I've said in the comments, 1 huge collection in the whole database seems kinda strange. Right?

Well the main purpose of the mongoDB is to support redundant data.I will recommend second option is better because In your scenario what I feel that if you embed path collection into the specific user then by using only single query you can get all data about user as well as related to path collection as well.
And if you follow first option then you have to fire two separates queries to get all data which will increase your work somewhat.
As mongodb brings data into the RAM so after getting data from one collection you can store it into cursor and from that cursor data you can fetch data from another collection. So if we see performance wise I dont think it will affect a lot.

RE: the edit. If you are going to store everything in a single doc and use embedded docs, then when you make your queries make sure you just select the data you need, otherwise you will load the whole doc including the embedded docs.

Why would Google Search use client-side URL parameters?

Yesterday morning I noticed Google Search was using hash parameters:
http://www.google.com/#q=Client-side+URL+parameters
which seems to be the same as the more usual search (with search?q=Client-side+URL+parameters). (It seems they are no longer using it by default when doing a search using their form.)
Why would they do that?
More generally, I see hash parameters cropping up on a lot of web sites. Is it a good thing? Is it a hack? Is it a departure from REST principles? I'm wondering if I should use this technique in web applications, and when.
There's a discussion by the W3C of different use cases, but I don't see which one would apply to the example above. They also seem undecided about recommendations.

Google has many live experimental features that are turned on/off based on your preferences, location and other factors (probably random selection as well.) I'm pretty sure the one you mention is one of those as well.
What happens in the background when a hash is used instead of a query string parameter is that it queries the "real" URL (http://www.google.com/search?q=hello) using JavaScript, then it modifies the existing page with the content. This will appear much more responsive to the user since the page does not have to reload entirely. The reason for the hash is so that browser history and state is maintained. If you go to http://www.google.com/#q=hello you'll find that you actually get the search results for "hello" (even if your browser is really only requesting http://www.google.com/) With JavaScript turned off, it wouldn't work however, and you'd just get the Google front page.
Hashes are appearing more and more as dynamic web sites are becoming the norm. Hashes are maintained entirely on the client and therefore do not incur a server request when changed. This makes them excellent candidates for maintaining unique addresses to different states of the web application, while still being on the exact same page.
I have been using them myself more and more lately, and you can find one example here: http://blixt.org/js -- If you have a look at the "Hash" library on that page, you'll see my implementation of supporting hashes across browsers.
Here's a little guide for using hashes for storing state:
How?
Maintaining state in hashes implies that your application (I'll call it application since you generally only use hashes for state in more advanced web solutions) relies on JavaScript. Without JavaScript, the only function of hashes would be to tell the browser to find content somewhere on the page.
Once you have implemented some JavaScript to detect changes to the hash, the next step would be to parse the hash into meaningful data (just as you would with query string parameters.)
Why?
Once you've got the state in the hash, it can be modified by your code (or your user) to represent the current state in your application. There are many reasons for why you would want to do this.
One common case is when only a small part of a page changes based on a variable, and it would be inefficient to reload the entire page to reflect that change (Example: You've got a box with tabs. The active tab can be identified in the hash.)
Other cases are when you load content dynamically in JavaScript, and you want to tell the client what content to load (Example: http://beta.multifarce.com/#?state=7001, will take you to a specific point in the text adventure.)
When?
If you had a look at my "JavaScript realm" you'll see a border-line overkill case. I did it simply because I wanted to cram as much JavaScript dynamics into that page as possible. In a normal project I would be conservative about when to do this, and only do it when you will see positive changes in one or more of the following areas:
User interactivity
Usually the user won't see much difference, but the URLs can be confusing
Remember loading indicators! Loading content dynamically can be frustrating to the user if it takes time.
Responsiveness (time from one state to another)
Performance (bandwidth, server CPU)
No JavaScript?
Here comes a big deterrent. While you can safely rely on 99% of your users to have a browser capable of using your page with hashes for state, there are still many cases where you simply can't rely on this. Search engine crawlers, for example. While Google is constantly working to make their crawler work with the latest web technologies (did you know that they index Flash applications?), it still isn't a person and can't make sense of some things.
Basically, you're on a crossroads between compatability and user experience.
But you can always build a road inbetween, which of course requires more work. In less metaphorical terms: Implement both solutions so that there is a server-side URL for every client-side URL that outputs relevant content. For compatible clients it would redirect them to the hash URL. This way, Google can index "hard" URLs and when users click them, they get the dynamic state stuff!

Recently google also stopped serving direct links in search results offering instead redirects.
I believe both have to do with gathering usage statistics, what searches were performed by the same user, in what sequence, what of the search results the user has followed etc.
P.S. Now, that's interesting, direct links are back. I absolutely remember seeing there only redirects in the last couple of weeks. They are definitely experimenting with something.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart