How to use multiple caches in rails? - ruby-on-rails

I have a rails application where I would like to use both memcached and the file store cache, for different purposes.
I want to use the file store cache to keep a large number of pages that don't change often (some not at all) - i.e. page caching - and use memcached for everything else (action and DB caching etc). The reason is that the pages stored on the file store cache are likely to require a large amount of storage, but individually most will be accessed infrequently.
Is this possible to do or will configuring memcached as the cache mean that it is also used for page caching?
As a secondary question, what is a safe way to remove pages from the file store cache in some form of cron job, as there does not seem to be an option to specify ttl for this cache. For example a UNIX find command would quickly find and remove all old pages or pages that haven't been accessed in a long time - is this safe to do given the app server might potentially try to serve one of those pages at the time (tho this is very unlikely)? If not then what is the best way to do this.

If you want to use the filesystem only for page caching and memcached for action and fragment caching, you're fine. Page caching always uses the filesystem. Just remember that page caching bypasses your Rails application, so you can't use it for pages that include content that changes from user to user or for pages that are access controlled with filters.
Regarding the removal of pages, on Unix, a file can be deleted, but it is not actually removed from disk until all open file handles are closed. If the app server has opened the file to serve a request, and the find command deletes it a split-second later, the app server doesn't suddenly get an error when it tries to read.
You could also consider having find delete files based on their last access time, instead of creation or modification, and using a sweeper in your Rails app to delete the cached page when its content is out of date.

A simpler approach may be to use a http cache upstream of your application as your page cache rather than two stores within rails. This way you can use http headers to control the cache behavior, including TTL's. These same limits will also apply to browser's local caches as a nice bonus.
Varnish is about as high performance as it gets, but would require setting up another moving piece in your hosting environment as a proxy. This may still be worthwhile depending on what you're doing.
A simpler approach might be Rack::Cache, which will be easy to set up provided you're using a rack enabled version of rails.

Related

Rails ActiveStorage: how to avoid one redirect for each image?

If you use ActiveStorage and you have a page with N images you get N additional requests to your Rails app (i.e. N redirects). That means wasting a lot of server resources if you have tens of images on a page.
I know that the redirect is useful for signed URLs. However I wonder why Rails does not precompute the final signed URL and embed that into the HTML page... In this way we could keep the advantages of signed URLs / protected files, without making N additional calls to the Rails server.
Is it possible to include the final URL / pre-signed URL of image variants directly in the HTML (thus avoiding the redirect)? Otherwise, why is that impossible?
After days of reasoning and tests, I am really excited of my final solution, which I explain below. This is an opinionated approach to images and may not represent the current Rails Way™️, however it has incredible advantages for websites that serve many public images, in particular:
When you serve a page with N images you don't get 1 + N requests to your app server, instead you get only 1 request for the page
The images are served through a CDN and this improves the loading time
The bucket is not completely public, instead it is protected by Cloudflare
The images are cached by Cloudflare, which greatly reduce your S3 bill
You greatly reduce the number of API requests (i.e. exists) to S3
This solution does not require large changes to Rails, and thus it is straightforward to switch back to Rails default behavior in case of problems
Here's the solution:
Create an s3 bucket and configure it to host a public website (i.e. call it storage.example.com) - you can even disable the public access at bucket level and allow access only to the Cloudflare ips using a bucket policy
Go to Cloudflare and configure a CNAME for storage.example.com that points to your domain; you need to use Flexible SSL (you can use a page rule for the subdomain); use page rules to set heavy caching: set Cache Everything and set a very long value (e.g. 1 year) for Browser Cache TTL and Edge Cache TTL
In you Rails application you can keep using private storage / acl, which is the default Rails behavior
In your Rails application call #post.variant(...).processed after every update or creation of #post; then in your views use 'https://storage.example.com/' + #post.variant(...).key' (note that we don't call processed here in the views to avoid additional checks in s3); you can also have a rake task that calls processed on each object, in case you need to regenerate the variants; this is works perfectly if you have only a few variants (e.g. 1 image / variant per post) that are changed infrequently
Most of the above steps are optional, so you can combine them based on your needs.
You can use the service_url to create direct links to your resources.
We don't use Rails views in our project so my knowledge about the view layer is rusty. I think you could put it in a dedicated helper and then use it from your views.

From the server, how do I force an external file to expire so that the browser receives a fresh one?

I have a show view, that uses a 'Universal Viewer' to load images. The image dimensions come from a json file that comes from a IIIF image server.
I fixed a bug and a new json file exists, but the user's browser is still using the old info.json file.
I understand that I could just have them do a hard-reload, like I myself did on my machine, but many users may be affected, and I'm just damn curious now.
Modern browsers all ship with cache control functionality baked into it. Using a combination of ETags and Cache-Control headers, you can accomplish what you seek without having to change the file names or use cache busting query parameters.
ETags allow you to communicate a token to a client that will tell their browser to update the cached version. This token can be created based on the content creation date, content length, or a fingerprint of the content.
Cache-Control headers allow you to create policies for web resources about how long, who, and how your content can be cached.
Using ETags and Cache-Control headers is a useful way to communicate to users when to update their cache when serving IIIF or any other content. However, adding ETags and Cache-Control this can be quite specific to your local implementation. Many frameworks (like Ruby on Rails) have much of this functionality baked into it. There are also web server configurations that may need to be modified, some sample configurations are available from the HTML5 Boilerplate project that use these strategies.
Sample Apache configurations for:
ETags https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/etags.conf
Cache expiration https://github.com/h5bp/server-configs-apache/blob/master/src/web_performance/expires_headers.conf
It depends on where the JSON file is being served from, and how it's being cached.
The guaranteed way to expire the cache on the file is to change the filename every time it changes. This is typically done be renaming it filename-MD5HASH.ext, where the MD5HASH is the MD5 hash of the file.
If you can't change the file name (it comes from a source you can't control, you might be able to get away with adding a caching busting query key to the URL. Something like http://example.com/file.ext?q=123456.

PDF caching on heroku with cloudflare

I'm having a problem getting the caching I need to work using CloudFlare.
We use CloudFlare for caching all our assets on S3 which works 100% using a separate subdomain cdn
We also use CloudFlare for our main site (hosted on Heroku) as well, e.g. www
My problem is I can't get CloudFlare to cache PDFs that are generated from our Rails app. I'm using the WickedPDF gem to dynamically generate certain PDFs for invoices, etc. I don't want to upload these as files to say S3 but we would like to have CloudFlare cache these so they don't get generated each and every time, as the time spent generating these PDFs is a little intensive.
CloudFlare is turned on and is "accelerating" for the subdomain in question and we're using SSL, but PDFs never seem to cache properly.
Is there something else we need to do to ensure these get cached? Or maybe there's another solution that would work for Heroku? (eg we can't use Page caching since it relies on the filesystem) I also checked the WickedPDF documentation so see if we could do anything else, but found nothing about expire controls.
Thanks,
We should actually cache it as long as the resources are on-domain & not being delivered through a third-party resource in some way.
Keep in mind:
1. Our caching depends on the number of requests for the resources (at least three).
2. Caching is very much data center dependent (in other words, if your site receives a lot of traffic at a data center it is going to be cached; if your site doesn't get a lot of traffic in another data center it may not cache).
I would open a support ticket if you're still having issues.

How do I dynamically create cached assets that allow browser cacheing in Ruby on Rails?

I have a web application that uses .js files filled with data to drive the front-end. These files can be large, so I want the browser to cache them. They are static data available to all the users, so I also want to have them page cached, as their creation can be time consuming.
The data that drives them changes at random intervals at the back end. Thus, when the data changes, I want to invalidate the server page cache AND the user's browser cache, causing a refresh.
The application also has a large # of static assets and we use the asset pipeline with precompilation to provide them in production (no dynamic compilation).
How can I page cache these files so they get served quickly to all users without hitting the full Rails stack, and browser cache... yet also invalidate both when necessary to update.
Maybe it's so easy as
javascript_include_tag 'jquery.js', 'jquery-ui.js', :cache => 'cached/all'
Take a look at Rails API about caching and the Jammit gem could be usefull also.

Make an ASP.NET MVC application Web Farm Ready

What will be the most efficient way to make an ASP.NET MVC application web-farm ready.
Most importantly sharing the current user's information (Context) and (not so important) cached objects such as look-up items (States, Street Types, counties etc.).
I have heard of/read MemCache but haven't seen a simple applicable way (documentation) on how to implement and test it.
Request context
Any request that hits a web farm gets served by an available IIS server. Context gets created there and the whole request gets served by the same server. So context shouldn't be a problem. A request is a stateless execution pipeline so it doesn't need to share data with other servers in any way shape or form. It will be served from the beginning to the end by the same machine.
User information is read from a cookie and processed by the server that serves the request. It depends then if you cache complete user object somewhere.
Session
If you use TempData dictionary you should be aware that it's stored inside Session dictionary. In a server farm that means you should use other means than InProc sessions, because they're not shared between IIS servers across the farm. You should configure other session managers that either use a DB or others (State server etc.).
Cache
When it comes to cache it's a different story. To make it as efficient as possible cache should as well be served. By default it's not. But looking at cache it barely means that when there's no cache it should be read and stored in cache. So if a particular server farm server doesn't have some cache object it would create it. In time all of them would cache some shared publicly used data.
Or... You could use libraries like memcached (as you mentioned it) and take advantage of shared cache. There are several examples on the net how to use it.
But these solutions all bring additional overhead of several things (like network and third process processing and data fetching etc.) if nothing else. So default cache is the fastest and if you explicitly need shared cache then decide for one. Don't share cache unless really necessary.

Resources