PDF caching on heroku with cloudflare - ruby-on-rails

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,

We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

Related

Rails ActiveStorage: how to avoid one redirect for each image?

If you use ActiveStorage and you have a page with N images you get N additional requests to your Rails app (i.e. N redirects). That means wasting a lot of server resources if you have tens of images on a page.
I know that the redirect is useful for signed URLs. However I wonder why Rails does not precompute the final signed URL and embed that into the HTML page... In this way we could keep the advantages of signed URLs / protected files, without making N additional calls to the Rails server.
Is it possible to include the final URL / pre-signed URL of image variants directly in the HTML (thus avoiding the redirect)? Otherwise, why is that impossible?
After days of reasoning and tests, I am really excited of my final solution, which I explain below. This is an opinionated approach to images and may not represent the current Rails Way™️, however it has incredible advantages for websites that serve many public images, in particular:
When you serve a page with N images you don't get 1 + N requests to your app server, instead you get only 1 request for the page
The images are served through a CDN and this improves the loading time
The bucket is not completely public, instead it is protected by Cloudflare
The images are cached by Cloudflare, which greatly reduce your S3 bill
You greatly reduce the number of API requests (i.e. exists) to S3
This solution does not require large changes to Rails, and thus it is straightforward to switch back to Rails default behavior in case of problems
Here's the solution:
Create an s3 bucket and configure it to host a public website (i.e. call it storage.example.com) - you can even disable the public access at bucket level and allow access only to the Cloudflare ips using a bucket policy
Go to Cloudflare and configure a CNAME for storage.example.com that points to your domain; you need to use Flexible SSL (you can use a page rule for the subdomain); use page rules to set heavy caching: set Cache Everything and set a very long value (e.g. 1 year) for Browser Cache TTL and Edge Cache TTL
In you Rails application you can keep using private storage / acl, which is the default Rails behavior
In your Rails application call #post.variant(...).processed after every update or creation of #post; then in your views use 'https://storage.example.com/' + #post.variant(...).key' (note that we don't call processed here in the views to avoid additional checks in s3); you can also have a rake task that calls processed on each object, in case you need to regenerate the variants; this is works perfectly if you have only a few variants (e.g. 1 image / variant per post) that are changed infrequently
Most of the above steps are optional, so you can combine them based on your needs.
You can use the service_url to create direct links to your resources.
We don't use Rails views in our project so my knowledge about the view layer is rusty. I think you could put it in a dedicated helper and then use it from your views.

"Request has expired" when using S3 with Active Storage

I'm using ActiveStorage for the first time.
Everything works fine in development but in production (Heroku) my images disappear without a reason.
They were showing ok the first time, but now no image is displayed. In the console I can see this error:
GET https://XXX.s3.amazonaws.com/variants/Q7MZrLyoKKmQFFwMMw9tQhPW/XXX 403 (Forbidden)
If I try to visit that URL directly I get an XML
<Error>
<Code>AccessDenied</Code>
<Message>Request has expired</Message>
<X-Amz-Expires>300</X-Amz-Expires>
<Expires>2018-07-24T13:48:25Z</Expires>
<ServerTime>2018-07-24T15:25:37Z</ServerTime>
<RequestId>291D41FAC6708334</RequestId>
<HostId>lEVGuwA6Hvlm/i40PeXaje9SEBYks9+uk6DvBs=</HostId>
</Error>
This is what I have in the view
<div class="cover" style="background-image: url('<%= rails_representation_path(experience.thumbnail) %>')"></div>
This is what I have in the model
def thumbnail
self.cover.variant(resize: "300x300").processed
end
In simple words, I don't want images to expire but to be always there.
Thanks
ActiveStorage does not support non-expiring link. It uses expiring links (private), and support uploading files only as private on your service.
It was a problem for me too, and did 2 patches (caution) for S3 only, one simple ~30lines that override ActiveStorage to work only with non-expiring (public) links, and another that add an acl option to has_one_attached and has_many_attached methods.
Hope it helps.
Your question doesn't say so, but it's common to use a CDN like AWS CloudFront with a Rails app. Especially on Heroku you probably want to conserve compute power.
Here is what happens in that scenario. You render a page as usual, and all the images are requested from the asset host, which is the CDN, because that's how it is configured to integrate. Its setup to fetch anything it doesn't find in cache from origin, which is your application again.
First all image requests are passed through. The ActiveStorage controller creates signed URLs for them, and the CDN passes them on, but also caches them.
Now comes the problem. The signed URL expires in 5 minutes by default, but the CDN caches usually much longer. This is because usually you use digest assets, meaning they are invalidated not by time but by name, on any change.
The solution is simple. Increase the expiry of the signed URL to be longer than the cache's TTL. Now the cache drops the cached signed URL before it becomes invalid.
Set the URL expiry using ActiveStorage::Service.url_expires_in in 5.2 or directly in Rails.application.config.active_storage.service_urls_expire_in in an initializer see this answer for details.
To set cache TTL in CloudFront: open the AWS console, pick the distribution, open the Behavior tab, scroll down to these fields:
Then optionally issue an invalidation to force re-caching of all contents.
Keep in mind there is a security trade-off. If the image contents are private, then they don't belong into a CDN most likely, and shouldn't have long lasting temp URLs either. In that case choose a solution that exempts attachments from CDN altogether. Your application will have to handle the additional load of signing all attached assets' URLs on top of rendering the relevant page.
Further keep in mind, that this isn't necessarily a good solution, but more of a workaround. With the above setup you will cache redirects, and the heavier requests will hit your storage bucket directly. The usual scenario for CDNs is large media, not lightweight redirects. You do relieve the app of handling a lot of requests though. How much that is a valid optimization should be looked into.
I had this same issue, but after I corrected the time on my computer, the problem was resolved. It was a server time difference, that the aws servers did not recognize.
#production.rb
Change
config.active_storage.service = :local
To
config.active_storage.service = :amazon
Should match aws/amazon whatever you defined it as in storage.yml

Automatically upload assets to S3 without assets_sync

I just did my first deploy to Heroku and besides my images, the assets work. I was reading about how to move the assets to s3 (and then cache them with cloudfront) when I found this gist:
https://gist.github.com/schneems/9374188
"I hate asset_sync"
Using asset sync can cause failures, is difficult to debug, un-needed, and adds extra complexity. Don't use it. Instead use https://devcenter.heroku.com/articles/using-amazon-cloudfront-cdn
The problem is, I can't find how to sync assets automatically like the gem does. Whats the best alternative to using the asset_sync gem?
Although an old question, in case anyone finds this question and is hoping for an answer here are my own findings.
Cloudfront has, for some time now, allowed users to set an origin value on their configuration. You want to set that to your application host. If you were deploying to a site accessible via https://myapp.com then you would use that as your Cloudfront origin. Then any cache misses from Cloudfront will route to your application layer at https://myapp.com, appending whatever path info is present in the request (e.g. /assets/css/whatever.css. This means that your application must be able to serve these static assets. If it can, then you're all set. If not, check the Rails guide for how to enable that.
Caveat! You cannot use a non-publicly-accessible URL for the origin. What does that mean? If you are provisioning your own pre-production application instances that are hidden behind a VPC, for example, then you cannot use those instances for your origin. Cloudfront cannot be granted special access to your instances. There's a workaround if you read Cloudfront's documentation on serving private content; basically you make your application publicly-accessible to anyone with the appropriate links but you enforce application-level constraints to disallow access to anyone not using a specially signed URL or cookie.
This is very old, but the answers here are not satisfying. The gist and articles linked do not intend to say the asset_sync gem is bad. The article intends to say you shouldn't be serving your assets from anywhere (S3 or not) but your application. Uploading your assets anywhere runs into the same issues mentioned in the gist, no matter if you're using this particular gem or not.
The suggestion is serving your assets from your application and then use a CDN such as Cloudfront to to ease the load on your server. You do not need to upload anything to Cloudfront. Instead, Cloudfront will ask your application for assets on-demand and cache automatically.
However, Cloudfront (or any CDN) do not address all use cases of asset_sync. If you're running into slug size issues and you've already cleaned repos and unnecessary dependencies, you might still need to use asset_sync to get your slug under 500MB and keep Heroku happy. Of course you can forego asset_sync and use a custom implementation to upload to S3 (or anywhere else).

best method for serving static pages hosted on s3 in rails app on heroku

I have a rails app which is dynamic and works well on it's own, but we also have an s3 bucket which has a bunch of html pages which are constantly updated and revised.
I'm looking for an overall solution which allows me to route requests to the static files for a large number of potential pages, but also use the app for dynamic pages. None of the static pages require a user login, but all of the dynamic pages require a user login.
We are also currently using heroku to serve the application which is something else to take into consideration.
What are some methods/gems/ideas for how to serve these static pages quickly on the same domain without interfering with the rest of the app?
According to Heroku you have 3 options:
You can use page_caching. page_caching has been removed from rails4 but is still available as an external gem actionpack-page_caching but from what I see, does not work as expected in heroku (files are not persisted).
You can use cache through HTTP headers (which actually caches data on the client)
You can use Rack::Cache as a reverse proxy.
I think the best solution for your matter is the third one. Heroku suggests that you use Rack::Cache. Of course you can combine that with the HTTP headers.
You can find more details here and here

How to use multiple caches in rails?

I have a rails application where I would like to use both memcached and the file store cache, for different purposes.
I want to use the file store cache to keep a large number of pages that don't change often (some not at all) - i.e. page caching - and use memcached for everything else (action and DB caching etc). The reason is that the pages stored on the file store cache are likely to require a large amount of storage, but individually most will be accessed infrequently.
Is this possible to do or will configuring memcached as the cache mean that it is also used for page caching?
As a secondary question, what is a safe way to remove pages from the file store cache in some form of cron job, as there does not seem to be an option to specify ttl for this cache. For example a UNIX find command would quickly find and remove all old pages or pages that haven't been accessed in a long time - is this safe to do given the app server might potentially try to serve one of those pages at the time (tho this is very unlikely)? If not then what is the best way to do this.
If you want to use the filesystem only for page caching and memcached for action and fragment caching, you're fine. Page caching always uses the filesystem. Just remember that page caching bypasses your Rails application, so you can't use it for pages that include content that changes from user to user or for pages that are access controlled with filters.
Regarding the removal of pages, on Unix, a file can be deleted, but it is not actually removed from disk until all open file handles are closed. If the app server has opened the file to serve a request, and the find command deletes it a split-second later, the app server doesn't suddenly get an error when it tries to read.
You could also consider having find delete files based on their last access time, instead of creation or modification, and using a sweeper in your Rails app to delete the cached page when its content is out of date.
A simpler approach may be to use a http cache upstream of your application as your page cache rather than two stores within rails. This way you can use http headers to control the cache behavior, including TTL's. These same limits will also apply to browser's local caches as a nice bonus.
Varnish is about as high performance as it gets, but would require setting up another moving piece in your hosting environment as a proxy. This may still be worthwhile depending on what you're doing.
A simpler approach might be Rack::Cache, which will be easy to set up provided you're using a rack enabled version of rails.

Resources