How does HTML5 AppCache handle redirects? - html5-appcache

If I include in my application cache manifest:
/example.html
and this redirects to
https://s3.amazonaws.com/longURL/example.html?dynamicauthenticationparameters
will this work?
The current draft HTML5 specification seems to be silent on redirects for content files (as opposed to the manifest itself) apart from referring to a manual redirect flag, which apparently is set but (as far as I can tell) never actually used.
(The intention is to avoid proxying some S3 content, but to still make it available offline using the cache mechanism. JavaScript and LocalStorage would presumably be a workaround if the above can't be done).
Any pointers to the relevant part of a spec and/or current browser implementation behavior would be helpful.

The current specification now states that if the resource is redirected to a different origin, then this is treated as a failure and the local cached copy (or fallback) is used instead.
In section 5.6.4 of http://www.w3.org/TR/2011/WD-html5-20110525/offline.html it states that:
Redirects are fatal because they are either indicative of a network
problem (e.g. a captive portal); or would allow resources to be added
to the cache under URLs that differ from any URL that the networking
model will allow access to, leaving orphan entries; or would allow
resources to be stored under URLs different than their true URLs. All
of these situations are bad.
So sadly you can't serve some pages from Amazon S3 or Cloudfront.

Related

Cherry-pick: from two URLs with same file on different CDNs, load whichever is in cache

I have a web app that wants to load bootstrap.min.js
It's on these two CDN's (among others):
https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.3.1/js/bootstrap.min.js
https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js
The odds of a cache hit from some other app using these CDN's is relatively high.
How can I tell the browser to check if they are cached and load from browser cache?
Can a service worker do this?
I believe that there are some privacy/security restrictions in place that attempt that make it difficult to determine, using JavaScript, whether a third-party URL is present in the browser's cache.
Adding a service worker into the mix will not get around those restrictions.
It's possible to use the Fetch API to create a Request with a mode of 'only-if-cached', which will behave more or less in the way you describe, but that will only work if the request's mode is 'same-origin'. In other words, only if the Request is for a first-party URL, not a third-party CDN URL as in your example.

"Request has expired" when using S3 with Active Storage

I'm using ActiveStorage for the first time.
Everything works fine in development but in production (Heroku) my images disappear without a reason.
They were showing ok the first time, but now no image is displayed. In the console I can see this error:
GET https://XXX.s3.amazonaws.com/variants/Q7MZrLyoKKmQFFwMMw9tQhPW/XXX 403 (Forbidden)
If I try to visit that URL directly I get an XML
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2018-07-24T13:48:25Z</Expires>
<ServerTime>2018-07-24T15:25:37Z</ServerTime>
<RequestId>291D41FAC6708334</RequestId>
<HostId>lEVGuwA6Hvlm/i40PeXaje9SEBYks9+uk6DvBs=</HostId>
</Error>
This is what I have in the view
<div class="cover" style="background-image: url('<%= rails_representation_path(experience.thumbnail) %>')"></div>
This is what I have in the model
def thumbnail
self.cover.variant(resize: "300x300").processed
end
In simple words, I don't want images to expire but to be always there.
Thanks
ActiveStorage does not support non-expiring link. It uses expiring links (private), and support uploading files only as private on your service.
It was a problem for me too, and did 2 patches (caution) for S3 only, one simple ~30lines that override ActiveStorage to work only with non-expiring (public) links, and another that add an acl option to has_one_attached and has_many_attached methods.
Hope it helps.
Your question doesn't say so, but it's common to use a CDN like AWS CloudFront with a Rails app. Especially on Heroku you probably want to conserve compute power.
Here is what happens in that scenario. You render a page as usual, and all the images are requested from the asset host, which is the CDN, because that's how it is configured to integrate. Its setup to fetch anything it doesn't find in cache from origin, which is your application again.
First all image requests are passed through. The ActiveStorage controller creates signed URLs for them, and the CDN passes them on, but also caches them.
Now comes the problem. The signed URL expires in 5 minutes by default, but the CDN caches usually much longer. This is because usually you use digest assets, meaning they are invalidated not by time but by name, on any change.
The solution is simple. Increase the expiry of the signed URL to be longer than the cache's TTL. Now the cache drops the cached signed URL before it becomes invalid.
Set the URL expiry using ActiveStorage::Service.url_expires_in in 5.2 or directly in Rails.application.config.active_storage.service_urls_expire_in in an initializer see this answer for details.
To set cache TTL in CloudFront: open the AWS console, pick the distribution, open the Behavior tab, scroll down to these fields:
Then optionally issue an invalidation to force re-caching of all contents.
Keep in mind there is a security trade-off. If the image contents are private, then they don't belong into a CDN most likely, and shouldn't have long lasting temp URLs either. In that case choose a solution that exempts attachments from CDN altogether. Your application will have to handle the additional load of signing all attached assets' URLs on top of rendering the relevant page.
Further keep in mind, that this isn't necessarily a good solution, but more of a workaround. With the above setup you will cache redirects, and the heavier requests will hit your storage bucket directly. The usual scenario for CDNs is large media, not lightweight redirects. You do relieve the app of handling a lot of requests though. How much that is a valid optimization should be looked into.
I had this same issue, but after I corrected the time on my computer, the problem was resolved. It was a server time difference, that the aws servers did not recognize.
#production.rb
Change
config.active_storage.service = :local
To
config.active_storage.service = :amazon
Should match aws/amazon whatever you defined it as in storage.yml

How to set up app to disallow Cloudfront from fetching anything?

I use rails 5.2 and cloudfront for assets
How to set up app to disallow Cloudfront from fetching anything except for assets?
CloudFront doesn't have an explicit way to "allow only" certain path prefixes, since they will ultimately match the default * cache behavior if they don't match any others, but there are several ways of working around this, depending on the level of sophistication and complexity that suits your taste... but all of them would start with this step:
create a new cache behavior using the desired path pattern, such as /assets/* and select your existing origin to handle these requests.
At this point, CloudFront still works as before, it's just internally considering the asset requests to match one behavior and everything else to match the other.
So, what we need next is something different for the "other."
The simplest solution is to create a second Origin, using the Origin Domain Name invalid.invalid. This is a syntactically valid hostname that points to a nonexistent target (the .invalid TLD is reserved for such purposes).
After creating this origin, edit your default cache behavior to use this new origin.
With this change in place and propagated, CloudFront will process /assets/* requests as before, but will throw an error on any other path. (The error is 502 Bad Gateway, if I remember correctly).
This accomplishes the simple purpose of blocking all other requests.
If you want to be a bit more proactive, and actually redirect requests back to the main site, you can accomplish this by creating an empty bucket in S3, and select the "Redirect requests" option. In the "target bucket or domain" box, put your main web site hostname. Then take the "Endpoint" shown in the Static website hosting box and use that as your origin hostname in CloudFront, for the default cache behavior. Any requests that arrive at CloudFront (for other than /assets/* will receive a redirect back to the main site.
This option may be the better option if your CDN has been inadvertently picked up by search engines, because the links will redirect back to the main site.

SSL path in appcache network being restricted in Chrome

Seeing some odd behaviour in Chrome, and not sure if it's expected behaviour when using appcache, or just Chrome.
It's a single-page app, powered by our RestAPI, it works fine when the RestAPI is being requested under HTTP, however as soon as we change the url to be the HTTPS version then it stops working. There's not a lot (i.e. any) information in Chrome's console as to why it decides to stop working.
We've managed to narrow it down to the NETWORK section in the appcache file, the only way we can get it to work is to use the * wildcard, which we don't want to do, as that bypasses the whole point of the appcache, and reduces security (from my understanding from reading the docs etc).
We've tried any and all variations of the API url (as in combinations of it with wildcards in various relevant locations), but none seem to work (even a https://* doesn't allow a successful request).
Any experienced know what's going on at all?
Thanks
Need a bit of clarification (see my comment), but in the meantime:
The NETWORK behaviour of the manifest is really there to, according to the spec, make "the testing of offline applications simpler", by reducing the difference between online and offline behaviour. In reality, it just adds another gotcha.
By default, anything that isn't explicitly in the manifest (listed in the manifest file), implicitly part of the cache (a visited page that points to the manifest), or covered by a FALLBACK prefix, will fail to load, even if you're online, unless the url is listed in the NETWORK section or the NETWORK section lists *.
Wildcards don't have special meaning in the NETWORK section, if you list http://whatever.com/* it will allow requests to that url, as an asterisk is a valid character in a url. The only special case is a single *, which means "allow the page to make network requests for any resources that aren't in the cache".
Basically, using * in NETWORK isn't a security risk, in fact it's probably what you want to do, every AppCache site I've built uses it.
I drew this flow chart to try and explain how appcache loads pages and resources:

How to use multiple caches in rails?

I have a rails application where I would like to use both memcached and the file store cache, for different purposes.
I want to use the file store cache to keep a large number of pages that don't change often (some not at all) - i.e. page caching - and use memcached for everything else (action and DB caching etc). The reason is that the pages stored on the file store cache are likely to require a large amount of storage, but individually most will be accessed infrequently.
Is this possible to do or will configuring memcached as the cache mean that it is also used for page caching?
As a secondary question, what is a safe way to remove pages from the file store cache in some form of cron job, as there does not seem to be an option to specify ttl for this cache. For example a UNIX find command would quickly find and remove all old pages or pages that haven't been accessed in a long time - is this safe to do given the app server might potentially try to serve one of those pages at the time (tho this is very unlikely)? If not then what is the best way to do this.
If you want to use the filesystem only for page caching and memcached for action and fragment caching, you're fine. Page caching always uses the filesystem. Just remember that page caching bypasses your Rails application, so you can't use it for pages that include content that changes from user to user or for pages that are access controlled with filters.
Regarding the removal of pages, on Unix, a file can be deleted, but it is not actually removed from disk until all open file handles are closed. If the app server has opened the file to serve a request, and the find command deletes it a split-second later, the app server doesn't suddenly get an error when it tries to read.
You could also consider having find delete files based on their last access time, instead of creation or modification, and using a sweeper in your Rails app to delete the cached page when its content is out of date.
A simpler approach may be to use a http cache upstream of your application as your page cache rather than two stores within rails. This way you can use http headers to control the cache behavior, including TTL's. These same limits will also apply to browser's local caches as a nice bonus.
Varnish is about as high performance as it gets, but would require setting up another moving piece in your hosting environment as a proxy. This may still be worthwhile depending on what you're doing.
A simpler approach might be Rack::Cache, which will be easy to set up provided you're using a rack enabled version of rails.

Resources