CDN and URL's with query-strings - url

We have an images folder on our web servers that we may publish via a CDN. Sometimes we append query-string like syntax to URL's to help us freshen content that has changed, even though it rarely does. Example:
/images/file.png?20090821
will URL's like this work with your average content-delivery-network?

Yes, We use Akamai, which keeps a cached copy of each distict url requested including the querystring. So the first request for /images/file.png?20090821 will go to the origin server. Requests there after for /images/file.png?20090821 will get the image from the Akamai servers. The next day, assuming the img src changes to /images/file.png?20090822, the first request will go to the origin server again.

Amazon CloudFront turned on this feature in May 2012

You wouldn't have problem with CDN. However, you may have problem with browsers. Some browsers wouldn't cache any content with query string. Even though it may be faster to fetch the image from CDN but it will not be as fast as cached image. So you want do something like this,
/images/file.png/20090821
Our CDN provider also recommends a hash mechanism. When we publish our content, it adds a hash to the URL so you don't have to add the version yourself. Unfortunately, I don't know the details on how that magic is done.

amazon cloudfront won't propagate the query string.

Related

"Request has expired" when using S3 with Active Storage

I'm using ActiveStorage for the first time.
Everything works fine in development but in production (Heroku) my images disappear without a reason.
They were showing ok the first time, but now no image is displayed. In the console I can see this error:
GET https://XXX.s3.amazonaws.com/variants/Q7MZrLyoKKmQFFwMMw9tQhPW/XXX 403 (Forbidden)
If I try to visit that URL directly I get an XML
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2018-07-24T13:48:25Z</Expires>
<ServerTime>2018-07-24T15:25:37Z</ServerTime>
<RequestId>291D41FAC6708334</RequestId>
<HostId>lEVGuwA6Hvlm/i40PeXaje9SEBYks9+uk6DvBs=</HostId>
</Error>
This is what I have in the view
<div class="cover" style="background-image: url('<%= rails_representation_path(experience.thumbnail) %>')"></div>
This is what I have in the model
def thumbnail
self.cover.variant(resize: "300x300").processed
end
In simple words, I don't want images to expire but to be always there.
Thanks
ActiveStorage does not support non-expiring link. It uses expiring links (private), and support uploading files only as private on your service.
It was a problem for me too, and did 2 patches (caution) for S3 only, one simple ~30lines that override ActiveStorage to work only with non-expiring (public) links, and another that add an acl option to has_one_attached and has_many_attached methods.
Hope it helps.
Your question doesn't say so, but it's common to use a CDN like AWS CloudFront with a Rails app. Especially on Heroku you probably want to conserve compute power.
Here is what happens in that scenario. You render a page as usual, and all the images are requested from the asset host, which is the CDN, because that's how it is configured to integrate. Its setup to fetch anything it doesn't find in cache from origin, which is your application again.
First all image requests are passed through. The ActiveStorage controller creates signed URLs for them, and the CDN passes them on, but also caches them.
Now comes the problem. The signed URL expires in 5 minutes by default, but the CDN caches usually much longer. This is because usually you use digest assets, meaning they are invalidated not by time but by name, on any change.
The solution is simple. Increase the expiry of the signed URL to be longer than the cache's TTL. Now the cache drops the cached signed URL before it becomes invalid.
Set the URL expiry using ActiveStorage::Service.url_expires_in in 5.2 or directly in Rails.application.config.active_storage.service_urls_expire_in in an initializer see this answer for details.
To set cache TTL in CloudFront: open the AWS console, pick the distribution, open the Behavior tab, scroll down to these fields:
Then optionally issue an invalidation to force re-caching of all contents.
Keep in mind there is a security trade-off. If the image contents are private, then they don't belong into a CDN most likely, and shouldn't have long lasting temp URLs either. In that case choose a solution that exempts attachments from CDN altogether. Your application will have to handle the additional load of signing all attached assets' URLs on top of rendering the relevant page.
Further keep in mind, that this isn't necessarily a good solution, but more of a workaround. With the above setup you will cache redirects, and the heavier requests will hit your storage bucket directly. The usual scenario for CDNs is large media, not lightweight redirects. You do relieve the app of handling a lot of requests though. How much that is a valid optimization should be looked into.
I had this same issue, but after I corrected the time on my computer, the problem was resolved. It was a server time difference, that the aws servers did not recognize.
#production.rb
Change
config.active_storage.service = :local
To
config.active_storage.service = :amazon
Should match aws/amazon whatever you defined it as in storage.yml

From the server, how do I force an external file to expire so that the browser receives a fresh one?

I have a show view, that uses a 'Universal Viewer' to load images. The image dimensions come from a json file that comes from a IIIF image server.
I fixed a bug and a new json file exists, but the user's browser is still using the old info.json file.
I understand that I could just have them do a hard-reload, like I myself did on my machine, but many users may be affected, and I'm just damn curious now.
Modern browsers all ship with cache control functionality baked into it. Using a combination of ETags and Cache-Control headers, you can accomplish what you seek without having to change the file names or use cache busting query parameters.
ETags allow you to communicate a token to a client that will tell their browser to update the cached version. This token can be created based on the content creation date, content length, or a fingerprint of the content.
Cache-Control headers allow you to create policies for web resources about how long, who, and how your content can be cached.
Using ETags and Cache-Control headers is a useful way to communicate to users when to update their cache when serving IIIF or any other content. However, adding ETags and Cache-Control this can be quite specific to your local implementation. Many frameworks (like Ruby on Rails) have much of this functionality baked into it. There are also web server configurations that may need to be modified, some sample configurations are available from the HTML5 Boilerplate project that use these strategies.
Sample Apache configurations for:
ETags https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/etags.conf
Cache expiration https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/expires_headers.conf
It depends on where the JSON file is being served from, and how it's being cached.
The guaranteed way to expire the cache on the file is to change the filename every time it changes. This is typically done be renaming it filename-MD5HASH.ext, where the MD5HASH is the MD5 hash of the file.
If you can't change the file name (it comes from a source you can't control, you might be able to get away with adding a caching busting query key to the URL. Something like http://example.com/file.ext?q=123456.

PDF caching on heroku with cloudflare

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,
We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

Add expire header to image from database

Does anybody know if it possible to cache the image from the database?
I know that there is an OutputCache attribute for above the Action. You then could set the VaryByParam to the id of the image in the database.
But this would just save the image on the server and not on the client right?
I was hoping that er was something as a expiration header for an image. Can you add that to an image? In that way, the client is responsible for the request to the server. This saves a request to the server...
If i'm wrong, please correct me because i'm new in this kind (OutputCache and Expiration Headers) of caching.
Thanks
Output caching affects the client's caching as well so this will actually work ok for you.
See my note on caching here:
Disable browser cache for entire ASP.NET website
Someone on that thread thought that output caching was only on the server side as well but a quick test can tell you otherwise. This doesn't mean there aren't scenarios where its limited to the server (such as varying by key). I would have one action method responsible for only serving up these files. That method doesn't need to cache by key, just change your Duration to say a minute and watch your headers coming down in Fiddler to verify.

URL fingerprint caching on Amazon S3

I have a bucket on Amazon S3 where I keep files that sometimes change but I want to use maximum caching on them, so I want to use URL fingerprinting to invalidate the cache.
I use the "last modified" date of the files for the fingerprint, and the html page requesting the S3 files always knows each file's fingerprint.
Now, I realize that I could use the fingerprint in the query string, like so:
http://aws.amazon.com/bucket/myFile.jpg?v=1310476099061
but the query string is not always enough for some proxies or older browsers to invalidate the cache, and some proxies and browsers don't even cache it if it contains a query string. That's why I want to keep the fingerprint in the actual URL, like one of these:
http://aws.amazon.com/bucket/myFile-1310476099061.jpg
http://aws.amazon.com/bucket/1310476099061/myFile.jpg
http://aws.amazon.com/bucket/myFile.jpg/1310476099061
etc
Any of these URLs would be perfect for requesting the myFile.jpg, but I want it all to be remapped to the http://aws.amazon.com/bucket/myFile.jpg file. That is, I only want the URL to change so the browser will think that it is a new file and get a fresh file which it will cache for a year. When I upload a new version of that file, the fingerprint is automatically updated.
Now here is my question: Is there any way to rewrite the url so that a request for a URL likehttp://aws.amazon.com/bucket/myFile-xxxxxx.jpg will serve the http://aws.amazon.com/bucket/myFile.jpg file on Amazon S3? Or are there any other workarounds that will still keep the file cached? Thanks =)
I'm afraid you're stuck with the version in the querystring. There is no way to rewrite the urls on S3 without actually changing the filename.

Resources