fragment caching and expiring amazon s3 links - ruby-on-rails

On the app I am working on I just implemented fragment caching for a threaded activity flow. Each thread is cached I gained almost 50% in processing time and it works fine, but a side effect I didn't anticipate is that amazon s3 links to attachments expire.
How can I have the links regenerated on each page refresh and at the same time avoid having to reload the whole conversation? In other words, is there some sort of after_filter :regenerate_uploads I can do through rails or javascript.
I'm totally new to caching so I may be missing some details or clarity, so please help me reformulate my question if needed

Related

"Request has expired" when using S3 with Active Storage

I'm using ActiveStorage for the first time.
Everything works fine in development but in production (Heroku) my images disappear without a reason.
They were showing ok the first time, but now no image is displayed. In the console I can see this error:
GET https://XXX.s3.amazonaws.com/variants/Q7MZrLyoKKmQFFwMMw9tQhPW/XXX 403 (Forbidden)
If I try to visit that URL directly I get an XML
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2018-07-24T13:48:25Z</Expires>
<ServerTime>2018-07-24T15:25:37Z</ServerTime>
<RequestId>291D41FAC6708334</RequestId>
<HostId>lEVGuwA6Hvlm/i40PeXaje9SEBYks9+uk6DvBs=</HostId>
</Error>
This is what I have in the view
<div class="cover" style="background-image: url('<%= rails_representation_path(experience.thumbnail) %>')"></div>
This is what I have in the model
def thumbnail
self.cover.variant(resize: "300x300").processed
end
In simple words, I don't want images to expire but to be always there.
Thanks
ActiveStorage does not support non-expiring link. It uses expiring links (private), and support uploading files only as private on your service.
It was a problem for me too, and did 2 patches (caution) for S3 only, one simple ~30lines that override ActiveStorage to work only with non-expiring (public) links, and another that add an acl option to has_one_attached and has_many_attached methods.
Hope it helps.
Your question doesn't say so, but it's common to use a CDN like AWS CloudFront with a Rails app. Especially on Heroku you probably want to conserve compute power.
Here is what happens in that scenario. You render a page as usual, and all the images are requested from the asset host, which is the CDN, because that's how it is configured to integrate. Its setup to fetch anything it doesn't find in cache from origin, which is your application again.
First all image requests are passed through. The ActiveStorage controller creates signed URLs for them, and the CDN passes them on, but also caches them.
Now comes the problem. The signed URL expires in 5 minutes by default, but the CDN caches usually much longer. This is because usually you use digest assets, meaning they are invalidated not by time but by name, on any change.
The solution is simple. Increase the expiry of the signed URL to be longer than the cache's TTL. Now the cache drops the cached signed URL before it becomes invalid.
Set the URL expiry using ActiveStorage::Service.url_expires_in in 5.2 or directly in Rails.application.config.active_storage.service_urls_expire_in in an initializer see this answer for details.
To set cache TTL in CloudFront: open the AWS console, pick the distribution, open the Behavior tab, scroll down to these fields:
Then optionally issue an invalidation to force re-caching of all contents.
Keep in mind there is a security trade-off. If the image contents are private, then they don't belong into a CDN most likely, and shouldn't have long lasting temp URLs either. In that case choose a solution that exempts attachments from CDN altogether. Your application will have to handle the additional load of signing all attached assets' URLs on top of rendering the relevant page.
Further keep in mind, that this isn't necessarily a good solution, but more of a workaround. With the above setup you will cache redirects, and the heavier requests will hit your storage bucket directly. The usual scenario for CDNs is large media, not lightweight redirects. You do relieve the app of handling a lot of requests though. How much that is a valid optimization should be looked into.
I had this same issue, but after I corrected the time on my computer, the problem was resolved. It was a server time difference, that the aws servers did not recognize.
#production.rb
Change
config.active_storage.service = :local
To
config.active_storage.service = :amazon
Should match aws/amazon whatever you defined it as in storage.yml

PDF caching on heroku with cloudflare

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,
We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

Caching( optimizing) Strategy with API live stream on Rails

So I built a website that uses Twitch.tv API, which is a gaming live stream website. The requests are long and slow, and I would like to cache it somehow. The problem is that there are a lot of dynamic attributes, if they are still online, or how many viewers there are. Since the traffic to my website is low at the moment, expiring Cache early isn't going to help much. Also, I have a page where it lists all the live streams, and it requests to see if the stream is online. So even if no one is online it still takes a while to load. Is there anyway to retrieve api faster without caching?
here is twitch.tv api doc
Since you don't own the Twitch.tv API, unfortunately I would say there is really nothing you can do to make their calls faster.
The good news is that you can cache the calls you make to them, which will make things appear faster to your users.
The way to cache the calls is to create a key and then cache the return JSON from the API. To create the key I would just use the URL you are calling for the API. Then just give the cached value an expiration time of a few minutes and when it expires you make another API call to re-populate the cache.
Also I'd look at Varnish (https://www.varnish-cache.org/) which does a lot of HTTP caching really well. Could work really well for you and it has the concept of a grace period that tries to hide the expensive calls made when the cache expires.

IE and Chrome caches 'no-cache' images

Hi I'm using fragment caching extensively in my rails application. I've cached fragments which have user images in them. So when a user changes his/her avatar(image), a lot of fragments have to be expired. To solve this, I've made some settings so that a user's avatar URL is always constant ( "/avatars/:user_id/thumbs" ).
Im using amazon s3 for storage and the expires header for the images is "no-cache".
Sample image URL : https://s3.amazonaws.com/bucket_name/avatars/388/thumbs
In IE and Chrome, Im facing a peculiar problem. When a user changes his avatar, it is not reflected immiediately on client side. It takes 2-3 page refresh for the image to change. Has anyone faced similar issues before?
You're going about this the wrong way in my opinion. Expire the fragment as you're doing but allow rails' cache busting timestamps to force the browser to download new images.
e.g. s3.amazonaws.com/bucket_name/avatars/388/thumbs/filename.jpg?1230601161

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Resources