Rails caching: Expiring multiple pages for one action - ruby-on-rails

I've set up action caching (with sweepers, but I guess that's irrelevant here) in my app, and so far it works great except for one thing:
I use Kaminari for pagination, and thus when I execute expire_action on my action it only expires the first page. As I know caching won't work when using the query string for specifying the page, I've set up a route so the pages are appended to the end of the url (for example /people/123/page/2).
I'll add more info to this post if necessary, but I'm guessing there is something obvious I'm missing here, so: Anyone know how to expire the rest of my pages?

I'm still interested in an answer to my original question and will change my accepted answer should a solution come up. That said, I ended up only caching the original page by checking if a page was specified at all:
caches_action :index, :if => Proc.new { params[:page].nil? }

Here is a solution I've thought of, facing the same problem, though haven't implemented it yet. Cache the actual expiry time in its own key. The key would be a canonical representation of the search URL, ie without the "page" parameter. e.g.:
User searches on http://example.com?q=foo&page=3, so params is { q: 'foo', page: 3 }. Strip out "page=3" and we're left with { q: 'foo' }.
Run to_param on it and add some prefix, and we're left with a cache key like search_expiry_q=foo.
Look up cache for this canonical query, ie Rails.cache.read(search_expiry_q=foo). If it exists, we'll make our result expire at this time. Unfortunately, we only have expires_in, not expires_at, so we'll have to do a calculation. i.e. something like expires_in: expiry_time - Time.now - 5.seconds (the 5 seconds hopefully prevents any race conditions). We cache the full URL/params this way.
OTOH if there's no expiry, then no-one's performed the search recently. So we do:
expiry_time = Time.now + 1.hour
Rails.cache.write(`search_expiry_q=foo`, expiry_time, expires_in: 1.hour)
And cache this fragment/page, again with full URL/params, and expires_in: 1.hour.

Related

Prevent bots from accessing rails active_storage images

My site has a large number of graphs which are recalculated each day as new data is available. The graphs are stored on Amazon S3 using active_storage. A typical example would be
# app/models/graph.rb
class Graph < ApplicationRecord
has_one_attached :plot
end
and in the view
<%= image_tag graphs.latest.plot %>
where graphs.latest retrieves the latest graph. Each day, a new graph and attached plot is created and the old graph/plot is deleted.
A number of bots, including from Google and Yandex are indexing the graphs, but then are generating exceptions when the bot returns and accesses the image again at urls like
www.myapp.com/rails/active_storage/representations/somelonghash
Is there a way to produce a durable link for the plot that does not expire when the graph/plot is deleted and then recalculated. Failing this, is there a way to block bots from accessing these plots.
Note that I currently have a catchall at the end of the routes.rb file:
get '*all', to: 'application#route_not_found', constraints: lambda { |req|
req.path.exclude? 'rails/active_storage'
} if Rails.env.production?
The exclusion of active storage in the catchall is in response to this issue. It is tempting to remove the active_storage exemption, but this might then stop proper active_storage routes.
Maybe I can put something in rack_rewrite.rb to fix this?
Interesting question.
A simple solution would be to use the send_data functionality to send the image directly. However, that can have it's own issues, mostly in terms of probably increasing server bandwidth usage (and reducing server performance). However, a solution like that is what you need if you don't want to go through the trouble of the below as far as creating a redirect model goes and the logic around that.
Original Answer
The redirect will require setting up some sort of Redirects::Graph model. That basically can verify that a graph was deleted and redirect to the new graph instead of the requested one. It would have two fields, a old_signed_id (biglonghash) and a new_signed_id.
Every time you delete
We'll need to populate the redirects model and also add a new entry every time a new graph is created (we should be able to generate the signed_id from the blob somehow).
For performance, and to avoid tons of redirects in a row which may result in a different error/issue. You'll have to manage the request change. IE: Say you now have a redirect A => B, you delete B replacing it with C now you need a A => C and B => C (to avoid an A => B => C redirect), this chain could get rather long indeed. This can be handled efficiently by just adding the new signed_id => new_id index and doing a Redirects::Graph.where(new_signed_id: old_signed_id).update_all(new_signed_id: new_signed_id) to update all the relevant old redirects, whenever you re-generate the graph.
The controller itself is trickier, cleanest method I can think of is to monkey patch the ActiveStorage::RepresentationsController to add a before_action that does something like this (may not work as-is, the params[:signed_id] and the representations path may not be right):
before_action :redirect_if_needed
def redirect_if_needed
redirect_model = Redirects::Graph.find_by(old_signed_id: params[:signed_id])
redirect_to rails_activestorage_representations_path(
signed_id: redirect_model.new_signed_id
) if redirect_model.present?
end
If you have version control setup for your database (IE: Papertrail gem or something) you may be able to work out the old_signed_id and new_signed_id with a bit of work and build the redirects for the urls currently causing errors. Otherwise, sadly this approach will only prevent future errors, and it may be impossible to get the current broken urls working.
Ideally, though you would update the blob itself to use the new graph instead of the old graph rather than deleting, but not sure that's possible/practical.
have you tried?
In the file /robots.txt put:
User-agent: *
Disallow: /rails/active_storage*

Rails: Model.find() or Model.find_by_id() to avoid RecordNotFound

I just realized I had a very hard to find bug on my website. I frequently use Model.find to retrieve data from my database.
A year ago I merged three websites causing a lot of redirections that needed to be handled. To do I created a "catch all"-functionality in my application controller as this:
around_filter :catch_not_found
def catch_not_found
yield
rescue ActiveRecord::RecordNotFound
require 'functions/redirections'
handle_redirection(request.path)
end
in addition I have this at the bottom of my routes.rb:
match '*not_found_path', :to => 'redirections#not_found_catcher', via: :get, as: :redirect_catcher, :constraints => lambda{|req| req.path !~ /\.(png|gif|jpg|txt|js|css)$/ }
Redirection-controller has:
def not_found_catcher
handle_redirection(request.path)
end
I am not sure these things are relevant in this question but I guess it is better to tell.
My actual problem
I frequently use Model.find to retrieve data from my database. Let's say I have a Product-model with a controller like this:
def show
#product = Product.find(params[:id])
#product.country = Country.find(...some id that does not exist...)
end
# View
<%= #product.country.name %>
This is something I use in some 700+ places in my application. What I realized today was that even though the Product model will be found. Calling the Country.find() and NOT find something causes a RecordNotFound, which in turn causes a 404 error.
I have made my app around the expectation that #product.country = nil if it couldn't find that Country in the .find-search. I know now that is not the case - it will create a RecordNotFound. Basically, if I load the Product#show I will get a 404-page where I would expect to get a 500-error (since #product.country = nil and nil.name should not work).
My question
My big question now. Am I doing things wrong in my app, should I always use Model.find_by_id for queries like my Country.find(...some id...)? What is the best practise here?
Or, does the problem lie within my catch all in the Application Controller?
To answer your questions:
should I always use Model.find_by_id
If you want to find by an id, use Country.find(...some id...). If you want to find be something else, use eg. Country.find_by(name: 'Australia'). The find_by_name syntax is no longer favoured in Rails 4.
But that's an aside, and is not your problem.
Or, does the problem lie within my catch all in the Application Controller?
Yeah, that sounds like a recipe for pain to me. I'm not sure what specifically you're doing or what the nature of your redirections is, but based on the vague sense I get of what you're trying to do, here's how I'd approach it:
Your Rails app shouldn't be responsible for redirecting routes from your previous websites / applications. That should be the responsibility of your webserver (eg nginx or apache or whatever).
Essentially you want to make a big fat list of all the URLs you want to redirect FROM, and where you want to redirect them TO, and then format them in the way your webserver expects, and configure your webserver to do the redirects for you. Search for eg "301 redirect nginx" or "301 redirect apache" to find out info on how to set that up.
If you've got a lot of URLs to redirect, you'll likely want to generate the list with code (most of the logic should already be there in your handle_redirection(request.path) method).
Once you've run that code and generated the list, you can throw that code away, your webserver will be handling the redirects form the old sites, and your rails app can happily go on with no knowledge of the previous sites / URLs, and no dangerous catch-all logic in your application controller.
That is a very interesting way to handle exceptions...
In Rails you use rescue_from to handle exceptions on the controller layer:
class ApplicationController < ActionController::Base
rescue_from SomeError, with: :oh_noes
private def oh_noes
render text: 'Oh no.'
end
end
However Rails already handles some exceptions by serving static html pages (among them ActiveRecord::RecordNotFound). Which you can override with dynamic handlers.
However as #joshua.paling already pointed out you should be handling the redirects on the server level instead of in your application.

Save a response from API call to use in a test so I don't have to continuously repeat requests to API

API requests take too long and are costing me money in my Rails integration tests and my application.
I would like to save API responses and then use that data for testing. Are there any good ways to make that happen?
Also, how can I make fewer api calls in production/development? What kind of caching can I use?
If I understand correctly, your rails app is using an external api, like a google/fb/twitter api, this kind of stuff
Caching the views won't work, because it only caches the template, so it doesn't waste time rendering the view again, and it validates that the cache is warm by hashing the data, which the code will still hit the api to verify that the hashes still match
For you the best way is to use a class that does all the api calls, and cache them in rails cache and give that cache a timeout period, because you don't want your cache to be too stale, but in the same time you will sacrifice some accuracy for some money ( like only do a single call every 5, 15, 30 mins, which ever you pick )
Here's a sample of what I have in mind, but you should modify it to match your needs
module ApiWrapper
class << self
def some_method(some_key) # if keys are needed, like an id or something
Rails.cache.fetch("some_method/#{some_key}", expires_in: 5.minutes) do
# assuming ApiLibrary is the external library handler
ApiLibrary.call_external_library(some_key)
end
end
end
end
Then in your code, call that wrapper, it will only contact the external api if the stored value in the cache is expired.
The call will be something like this
# assuming 5 is the id or value you want to fetch from the api
ApiWrapper.some_method(5)
You can read more about caching methods from the rails guide for caching
Update:
I just thought of another way, for your testing (like rspec tests) you could stub the api calls, and this way you'll save the whole api call, unless you are testing the api it self, using to the same api library I wrote above, we can stub the ApiLibrary it self
allow(ApiLibrary).to receive(:some_method).and_return({ data: 'some fake data' })
PS: the hash key data is part of the return, it's the whole hash not just the string.
There is a great gem for this called VCR. It allows you to make a single request and keep response cached, so every time you run the test you will use this saved response.
I would use http://redis.io/ in conjunction with something like jbuilder. So as an example your view would look like:
json.cache! ["cache", "plans_index"] do
json.array! #plans do |plan|
json.partial! plan
end
end
for this controller:
def index
#plans = Plan.all
end
If you have something that is a show page you can cache it like this:
json.cache! ["cache", "plan_#{params["id"]}"] do
json.extract! #plan, :short_description, :long_description,
end

Cache warming with RABL for JSON templates

Okay, this post is a bit wordy before I got to the actual question, so the abridged version basically pertains to cache warming using RABL templates. When calling Rabl.render vs API calls, the caches generated do not have the same cache-keys. When using Rabl.render directly should I expect cache-keys to match, when the same template is called via an API?
K, now the wind-up..
I have a Rails API server on Heroku. I've done a lot of optimization with RABL using russian doll caching to improve the reuse of the underlying objects in collections. Though, I am still left with collection caches, that when generated by the user on first request, is a burden on the experience ( e.g. 1+ second api calls).
When debugging a sample API call, I get the following cache actions on a given object.
...api/v1/activities/26600 :
Cache read: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache generate: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache write: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
so for the same object when calling ...api/v1/activities ( after above call ) I get the desired cache hit:
Cache read: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache fetch_hit: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
This works great. The next step is to avoid having the first/any API call spend time generating the cache.
So I have been pursuing cache warming techniques to generate these collections prior to the user accessing them. One suggestion is using wget as a way to hit the API directly ( see https://stackoverflow.com/a/543988/451488 ). But this adds a load on the Heroku web dynos, so I want to background cache warming via sidekiq workers.
RABL provides a way to render a template from code directly ( https://github.com/nesquena/rabl#rendering-templates-directly ) which definitely seems like the right approach for this use case. Therefore my intent is to call the RABL engine through some event prior to an API call (for example - a User login event).
So for the above API example, I'd call the following in rails console and expect a cache hit.
irb(main):002:0> #activity = Activity.find(26600)
irb(main):003:0> Rabl.render(#activity, 'api/v2/activities/show_no_root', :view_path => 'app/views', :format => :json)
Cache read: rabl/activities/26600-20140423170223588554000//hash
Cache generate: rabl/activities/26600-20140423170223588554000//hash
Cache write: rabl/activities/26600-20140423170223588554000//hash
Unexpectedly, I did not get a cache hit, but it is obvious that the cache-keys are not the same since the trailing hash signature is missing. I'm not sure why the cache-keys would be different in this case. I am left with no way to warm caches for RABL templates.
UPDATE
Turns out the hash in the trailing cache key is the template hash.
Cache digest for api/v1/activities/_show.rabl: d30440d18014c72014a05319af0626f7
Though this tells me the source of that hash, its still not clear why calling Rabl::Renderer directly would not use this as well.
I wasn't able to use Rabl::Render due to the missing template digest hash in the cache-key. However by creating a sidekiq worker as below, I am able to warm the cache by calling the api as a background process, which works nicely.
class CacheWarmApi
include Sidekiq::Worker
sidekiq_options :queue => :warmers
def perform( url_helper, args, params={},method='get')
if method == 'get'
session = ActionDispatch::Integration::Session.new(Rails.application)
session.get(session.send(url_helper, *args), params)
end
end
end
For example :
CacheWarmApi.perform_async( :api_v2_expensiveapi_url, args_array , params_hash)
I think this is a bit too heavy of a solution, and still think there is a solution out there with Rabl::Render.
I've managed to accomplish this by calling Rabl.render(ref) directly, storing the result in cache directly (Redis in my case) and using that from the request handler in the controller:
Worker:
#... do some long work to calculate object_to_render_with
#.
#.
#.
#render the result using Rabl
render_result= Rabl.render(object_to_render_with, rabl_view_relative_path,
view_path: File.join(Rails.root, 'app/views'),
format: :json) #already the default, being explicit
Redis.current.set(cache_key, render_result, ex: (DEFAULT_CACHE_EXPIRY))
Controller:
def request
#check if we have a recent result
res = Redis.current.get cache_key
if res.nil?
# No result in cache, start the worker
Worker.perform_async
end
# use that cached result or an empty result
render json: res || {status: 'in_progress'}
end
Bonus: also added a layer of tracking progress in the same manner (using other key and other request, updating the Redis manually along the progress of work).
HTH

Caching/Etag for Static Action in Rails 4

Since Rails 4 removed page caching and action caching, I'm wondering what is the Rails 4 way to cache an action that has no variables and has only html in the view? Should I fragment cache the static html in the view? How do I set an etag/fresh_when when there is no model to set it to expire with? I'm struggling to find an example or convention to caching what should be the easiest page to cache.
One note is that while the view is completely static, the page still has a dynamic navbar depending on whether the user is signed in or not. How would you handle a static page like this without resorting to action caching since its been removed and the convention has been set not to use the gem version?
Example:
class HomesController < ApplicationController
def index
end
end
homes/index.html.erb
<div>A bunch of normal html tags with no erb</div>
Edit:
Based on #severin's answer and my own research, here is what I have come up with so far.
class HomesController < ApplicationController
def index
fresh_when(["some-identifier", current_user, flash])
end
end
In addition, I'm using https://github.com/n8/bust_rails_etags to reset all etags after a deploy because the view may have changed between deploys. I think this covers the etag fairly well although I'm still curious whether fresh when will include some identifier about the view automatically and whether "some-idnetifier" is necessary? Is it going to be a problem that sometimes current_user and flash will be nil?
Now on the second point of fragment caching the static content. I'm assuming if I did:
cache "v1" do
all my html
end
I'd have to remember to always change the cache identifier when the page is changed otherwise my app would serve stale content. Any way to automate this as well or is already handled by rails? It would be nice to just cache the last time the view was updated or something clever so I don't have to keep track on when my static content is changed.
You can set an etag/last modified at data without a model, check the documentation: http://api.rubyonrails.org/classes/ActionController/ConditionalGet.html#method-i-fresh_when
So you could do something like:
def index
fresh_when(:etag => 'some_made_up_etag', :last_modified => a_long_time.ago, :public => true)
render
end
Note: you don't need to provide an etag AND a last modified at timestamp, you could just provide an etag or only a last modified at timestamp.
In addition to this, I would also fragment cache the whole content of the view.
Or you could just continue using action-/page_caching using the official plugin/gem: https://github.com/rails/actionpack-page_caching
Some additions regarding the second part of your question:
Rails adds the content of the RAILS_CACHE_ID environment variable to all its cache keys (the etag and the fragment cache key in your example). The bust_rails_etags gem adds another environment variable that affects only the etags... So in your case you could just remove the bust_rails_etags gem and update the RAILS_CACHE_ID environment variable on all your deploys.
You can even automate the updating of the RAILS_CACHE_ID environment variable by adding something like this in config/environment.rb:
code_revision = # ... some piece of code that gets the current revision.
# I'm using git and I use the following (crude) piece of
# to get the current revision:
# code_revision = `git log --pretty=format:%h -n1`.strip
ENV['RAILS_CACHE_ID'] = code_revision
This way, the current code revision is always added to all cache keys.

Resources