I've built a Progressive Web App that uses caching, but it's unclear to me whether users can (accidentally or on purpose) clear the service worker cache, which may clear my tracking data.
When a user clears their browsing data / cookies, this clears all site storage which includes the SW cache, cookies, local storage, indexeddb, and any other local caching system.
Furthermore, Ctrl-F5 forces a cache refresh, and is intended to abandon all cached content including service worker cache and just retrieve all content from the servers again.
"Clear site data" in Chrome 76 will delete the caches and the worker, however the deleted worker remains "activated and running". So that's a case that needs dealing with.
Related
Whenever I have an issue with a website, one of the first suggestions I will hear is “try to clear your browser cache” along with “and delete your cookies“. So what is this browser cache? What does it store and what is it good for?
I have googled. but didn't find the proper answer.I appreciate if anyone help on this.
A browser cache "caches" (as in keeps local copies) of data downloaded from the internet. The next time your browser needs the same data it can get it from the cache (fast) instead of downloading it over the internet (slow)
The problem is that data can be old. For example imagine the browser cached www.nytimes.com today and 24hrs later you visited www.nytimes.com again. If the browser loaded the cached data it would be old news.
So there are headers (metadata) that the servers send to browser telling them how long they should cache something (if at all).
The data the browser generally caches are "requests" which. In other words if your browser asks for "http://foo.com/bar.html" the first time the browser will "request" that "foo.com" send it "bar.html". If the headers from "foo.com" are set a certain way the browser will the save a local copy of "bar.html". If you request the same thing again the browser may load "bar.html" from it's cache. I say "may" because it depends on the headers sent from the server. The server can say how long (say 10 minutes, 10 hours, 10 days, etc..) or it can say "don't cache this at all, always download the newest version".
If you go to your browser's dev tools (chrome shown below) and look at the network tab (not sure what it's called in other browsers). Load the page again and you can see all the requests. You'll also notice which ones were loaded from the cache
If you click on a request you can see the metadata from both the browser (request headers) and the server (response headers)
The reason clearing the cache often fixes things is if for some reason the server (a bug?) said it was ok to cache or used the cached version but the data on the server has actually been updated. The browser, doing what the server told it to do, is using its copy from the cache, not the newer version which is actually needed. There might also from time to time be bugs in the browser itself related to caching.
When everything is working correctly it's great but if one thing or another is mis-configured or sending the wrong headers then the browser can end up loading old data from the cache instead of downloading the newest data. Clearing your cache effectively forces the browser to download the data again.
You can find out the details of what the various headers do here.
Browser caches are not mere rubbish bins but a mechanism to speed up the way we browse the web. Each website we visit has certain common elements like logos, navigation buttons, GIF animation files, script files etc. It doesn’t make sense for the browser to download each element (also commonly called as Temporary Internet files) when we hop from page to the other and back.
The page elements are downloaded when we visit a website and the browser checks its cache folder for copies when we browse the website. If a copy exists, then the browser doesn’t download the same file again, thus significantly speeding up web browsing speeds.
for more info..
http://www.guidingtech.com/8925/what-are-browser-cache-cookies-does-clearing-them-help/
https://en.wikipedia.org/wiki/Cache_(computing)
First result in Google, this is the proper answer, but I will summarize =]
1) What is Browser Cache?
Cache is a component that stores data so future requests for that data can be served faster; the data stored in a cache might be the results of an earlier computation, or the duplicates of data stored elsewhere.
2) What does it store?
Web browsers and web proxy servers employ web caches to store previous responses from web servers, such as web pages and images.
3) What is it good for?
Web caches reduce the amount of information that needs to be transmitted across the network, as information previously stored in the cache can often be re-used. This reduces bandwidth and processing requirements of the web server, and helps to improve responsiveness for users of the web.
I'm considering using Amazon RDS with read replicas to scale our database.
Some of our controllers in our web application are read/write, some of them are read-only. We already have an automated way for identifying which controllers are read-only, so my first approach would have been to open a connection to the master when requesting a read/write controller, else open a connection to a read replica when requesting a read-only controller.
In theory, that sounds good. But then I stumbled open the replication lag concept, which basically says that a replica can be several seconds behind the master.
Let's imagine the following use case then:
The browser posts to /create-account, which is read/write, thus connecting to the master
The account is created, transaction committed, and the browser gets redirected to /member-area
The browser opens /member-area, which is read-only, thus connecting to a replica. If the replica is even slightly behind the master, the user account might not exist yet on the replica, thus resulting in an error.
How do you realistically use read replicas in your application, to avoid these potential issues?
I worked with application which used pseudo-vertical partitioning. Since only handful of data was time-sensitive the application usually fetched from slaves and from master only in selected cases.
As an example: when the User updated their password application would always ask master for authentication prompt. When changing non-time sensitive data (like User Preferences) it would display success dialog along with information that it might take a while until everything is updated.
Some other ideas which might or might not work depending on environment:
After update compute entity checksum, store it in application cache and when fetching the data always ask for compliance with checksum
Use browser store/cookie for storing delta ensuring User always sees the latest version
Add "up-to-date" flag and invalidate synchronously on every slave node before/after update
Whatever solution you choose keep in mind it's subject of CAP Theorem.
This is a hard problem, and there are lots of potential solutions. One potential solution is to look at what facebook did,
TLDR - read requests get routed to the read only copy, but if you do a write, then for the next 20 seconds, all your reads go to the writeable master.
The other main problem we had to address was that only our master
databases in California could accept write operations. This fact meant
we needed to avoid serving pages that did database writes from
Virginia because each one would have to cross the country to our
master databases in California. Fortunately, our most frequently
accessed pages (home page, profiles, photo pages) don't do any writes
under normal operation. The problem thus boiled down to, when a user
makes a request for a page, how do we decide if it is "safe" to send
to Virginia or if it must be routed to California?
This question turned out to have a relatively straightforward answer.
One of the first servers a user request to Facebook hits is called a
load balancer; this machine's primary responsibility is picking a web
server to handle the request but it also serves a number of other
purposes: protecting against denial of service attacks and
multiplexing user connections to name a few. This load balancer has
the capability to run in Layer 7 mode where it can examine the URI a
user is requesting and make routing decisions based on that
information. This feature meant it was easy to tell the load balancer
about our "safe" pages and it could decide whether to send the request
to Virginia or California based on the page name and the user's
location.
There is another wrinkle to this problem, however. Let's say you go to
editprofile.php to change your hometown. This page isn't marked as
safe so it gets routed to California and you make the change. Then you
go to view your profile and, since it is a safe page, we send you to
Virginia. Because of the replication lag we mentioned earlier,
however, you might not see the change you just made! This experience
is very confusing for a user and also leads to double posting. We got
around this concern by setting a cookie in your browser with the
current time whenever you write something to our databases. The load
balancer also looks for that cookie and, if it notices that you wrote
something within 20 seconds, will unconditionally send you to
California. Then when 20 seconds have passed and we're certain the
data has replicated to Virginia, we'll allow you to go back for safe
pages.
If you are running a Rails 3 app with multiple web dynos on Heroku,
Every time you hit the app, do you typically connect with a different web dyno?
Can sessions work across different web dynos?
Does it work for different Rails session stores (ActionDispatch::Session::CookieStore,
ActiveRecord::SessionStore, and ActionDispatch::Session::CacheStore)
In short yes - sessions will work across multiple web dynos.
Sessions work across web dynos - because Rail's design of session support allows it to. If anything, the web dyno model is exactly how Rail's was intended to be scaled horizontally.
1. Every time you hit the app, do you typically connect with a different web dyno?
Based on heroku documentation:
The routing mesh is responsible for determining the location of your application’s web dynos within the dyno manifold and forwarding the HTTP request to one of these dynos. Dyno selection is performed using a random selection algorithm.
So dyno selection is random... but that dyno has to have your application installed. So if you have more than one dyno, then you may end up connecting to a different dyno (which is important as this facilitates load balancing and high availability)
2. Can sessions work across different web dynos?
Yes. Most web stacks support sessions by doing the following:
Assigning a session id - which is a unique id, and it is usually set as a session cookie so that the browser will always send the id with ANY HTTP request to the originating host
Providing storage which maps the session id to the actual session data
So by this process, sessions can be supported as every inbound HTTP request has the session ID, which is accessible by the web dyno when it handles your request.
3. Does it work for different Rails session stores (ActionDispatch::Session::CookieStore, ActiveRecord::SessionStore, and ActionDispatch::Session::CacheStore)
ActionDispatch::Session::CookieStore
Yes. The cookie store stores encrypted session data as a cookie. So your browser sends all the session data (encrypted) back to the host, which is then decrypted for use within your app.
ActiveRecord::SessionStore
Yes. The cookie store stores encrypted session data in a database table. An ID is then assigned as a cookie. So your browser sends the ID to the host, which is then used to load the session data from the database. Since all web dynos have a connection to the DB, this means it is also supported.
ActionDispatch::Session::CacheStore
Yes but you need a cache store service (eg MemCache addon). The cookie store stores encrypted session data in a cache store (memcache), which is a shared service across all web dynos. An ID is then assigned as a cookie. So your browser sends the ID to the host, which is then used to load session data from the cache store (memcache).
I do not believe Heroku makes any effort to send consecutive requests to the same web dyno. I might be wrong and they make some effort, but even if they do, it isn't likely to be anything like reliable enough to count on for session management.
However, ActionDispatch::Session::CookieStore will definitely work because the data is stored in an encrypted client-side cookie. ActiveRecord::SessionStore will work because the data is stored in the database, which is presumably shared by all web dynos. ActiveDispatch::Session::CacheStore should work if you use a MemCached server shared between all clients, or a similar shared cache.
The only thing that wouldn't work is some sort of file-based session storage on the local filesystem, and situations like multiple Heroku dynos is exactly why that type of session storage is not common in modern web applications.
What will be the most efficient way to make an ASP.NET MVC application web-farm ready.
Most importantly sharing the current user's information (Context) and (not so important) cached objects such as look-up items (States, Street Types, counties etc.).
I have heard of/read MemCache but haven't seen a simple applicable way (documentation) on how to implement and test it.
Request context
Any request that hits a web farm gets served by an available IIS server. Context gets created there and the whole request gets served by the same server. So context shouldn't be a problem. A request is a stateless execution pipeline so it doesn't need to share data with other servers in any way shape or form. It will be served from the beginning to the end by the same machine.
User information is read from a cookie and processed by the server that serves the request. It depends then if you cache complete user object somewhere.
Session
If you use TempData dictionary you should be aware that it's stored inside Session dictionary. In a server farm that means you should use other means than InProc sessions, because they're not shared between IIS servers across the farm. You should configure other session managers that either use a DB or others (State server etc.).
Cache
When it comes to cache it's a different story. To make it as efficient as possible cache should as well be served. By default it's not. But looking at cache it barely means that when there's no cache it should be read and stored in cache. So if a particular server farm server doesn't have some cache object it would create it. In time all of them would cache some shared publicly used data.
Or... You could use libraries like memcached (as you mentioned it) and take advantage of shared cache. There are several examples on the net how to use it.
But these solutions all bring additional overhead of several things (like network and third process processing and data fetching etc.) if nothing else. So default cache is the fastest and if you explicitly need shared cache then decide for one. Don't share cache unless really necessary.
I have a rails application where I would like to use both memcached and the file store cache, for different purposes.
I want to use the file store cache to keep a large number of pages that don't change often (some not at all) - i.e. page caching - and use memcached for everything else (action and DB caching etc). The reason is that the pages stored on the file store cache are likely to require a large amount of storage, but individually most will be accessed infrequently.
Is this possible to do or will configuring memcached as the cache mean that it is also used for page caching?
As a secondary question, what is a safe way to remove pages from the file store cache in some form of cron job, as there does not seem to be an option to specify ttl for this cache. For example a UNIX find command would quickly find and remove all old pages or pages that haven't been accessed in a long time - is this safe to do given the app server might potentially try to serve one of those pages at the time (tho this is very unlikely)? If not then what is the best way to do this.
If you want to use the filesystem only for page caching and memcached for action and fragment caching, you're fine. Page caching always uses the filesystem. Just remember that page caching bypasses your Rails application, so you can't use it for pages that include content that changes from user to user or for pages that are access controlled with filters.
Regarding the removal of pages, on Unix, a file can be deleted, but it is not actually removed from disk until all open file handles are closed. If the app server has opened the file to serve a request, and the find command deletes it a split-second later, the app server doesn't suddenly get an error when it tries to read.
You could also consider having find delete files based on their last access time, instead of creation or modification, and using a sweeper in your Rails app to delete the cached page when its content is out of date.
A simpler approach may be to use a http cache upstream of your application as your page cache rather than two stores within rails. This way you can use http headers to control the cache behavior, including TTL's. These same limits will also apply to browser's local caches as a nice bonus.
Varnish is about as high performance as it gets, but would require setting up another moving piece in your hosting environment as a proxy. This may still be worthwhile depending on what you're doing.
A simpler approach might be Rack::Cache, which will be easy to set up provided you're using a rack enabled version of rails.