Cache warming with RABL for JSON templates - ruby-on-rails

Okay, this post is a bit wordy before I got to the actual question, so the abridged version basically pertains to cache warming using RABL templates. When calling Rabl.render vs API calls, the caches generated do not have the same cache-keys. When using Rabl.render directly should I expect cache-keys to match, when the same template is called via an API?
K, now the wind-up..
I have a Rails API server on Heroku. I've done a lot of optimization with RABL using russian doll caching to improve the reuse of the underlying objects in collections. Though, I am still left with collection caches, that when generated by the user on first request, is a burden on the experience ( e.g. 1+ second api calls).
When debugging a sample API call, I get the following cache actions on a given object.
...api/v1/activities/26600 :
Cache read: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache generate: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache write: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
so for the same object when calling ...api/v1/activities ( after above call ) I get the desired cache hit:
Cache read: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
Cache fetch_hit: rabl/activities/26600-20140423170223588554000//hash/d30440d18014c72014a05319af0626f7
This works great. The next step is to avoid having the first/any API call spend time generating the cache.
So I have been pursuing cache warming techniques to generate these collections prior to the user accessing them. One suggestion is using wget as a way to hit the API directly ( see https://stackoverflow.com/a/543988/451488 ). But this adds a load on the Heroku web dynos, so I want to background cache warming via sidekiq workers.
RABL provides a way to render a template from code directly ( https://github.com/nesquena/rabl#rendering-templates-directly ) which definitely seems like the right approach for this use case. Therefore my intent is to call the RABL engine through some event prior to an API call (for example - a User login event).
So for the above API example, I'd call the following in rails console and expect a cache hit.
irb(main):002:0> #activity = Activity.find(26600)
irb(main):003:0> Rabl.render(#activity, 'api/v2/activities/show_no_root', :view_path => 'app/views', :format => :json)
Cache read: rabl/activities/26600-20140423170223588554000//hash
Cache generate: rabl/activities/26600-20140423170223588554000//hash
Cache write: rabl/activities/26600-20140423170223588554000//hash
Unexpectedly, I did not get a cache hit, but it is obvious that the cache-keys are not the same since the trailing hash signature is missing. I'm not sure why the cache-keys would be different in this case. I am left with no way to warm caches for RABL templates.
UPDATE
Turns out the hash in the trailing cache key is the template hash.
Cache digest for api/v1/activities/_show.rabl: d30440d18014c72014a05319af0626f7
Though this tells me the source of that hash, its still not clear why calling Rabl::Renderer directly would not use this as well.

I wasn't able to use Rabl::Render due to the missing template digest hash in the cache-key. However by creating a sidekiq worker as below, I am able to warm the cache by calling the api as a background process, which works nicely.
class CacheWarmApi
include Sidekiq::Worker
sidekiq_options :queue => :warmers
def perform( url_helper, args, params={},method='get')
if method == 'get'
session = ActionDispatch::Integration::Session.new(Rails.application)
session.get(session.send(url_helper, *args), params)
end
end
end
For example :
CacheWarmApi.perform_async( :api_v2_expensiveapi_url, args_array , params_hash)
I think this is a bit too heavy of a solution, and still think there is a solution out there with Rabl::Render.

I've managed to accomplish this by calling Rabl.render(ref) directly, storing the result in cache directly (Redis in my case) and using that from the request handler in the controller:
Worker:
#... do some long work to calculate object_to_render_with
#.
#.
#.
#render the result using Rabl
render_result= Rabl.render(object_to_render_with, rabl_view_relative_path,
view_path: File.join(Rails.root, 'app/views'),
format: :json) #already the default, being explicit
Redis.current.set(cache_key, render_result, ex: (DEFAULT_CACHE_EXPIRY))
Controller:
def request
#check if we have a recent result
res = Redis.current.get cache_key
if res.nil?
# No result in cache, start the worker
Worker.perform_async
end
# use that cached result or an empty result
render json: res || {status: 'in_progress'}
end
Bonus: also added a layer of tracking progress in the same manner (using other key and other request, updating the Redis manually along the progress of work).
HTH

Related

Prevent bots from accessing rails active_storage images

My site has a large number of graphs which are recalculated each day as new data is available. The graphs are stored on Amazon S3 using active_storage. A typical example would be
# app/models/graph.rb
class Graph < ApplicationRecord
has_one_attached :plot
end
and in the view
<%= image_tag graphs.latest.plot %>
where graphs.latest retrieves the latest graph. Each day, a new graph and attached plot is created and the old graph/plot is deleted.
A number of bots, including from Google and Yandex are indexing the graphs, but then are generating exceptions when the bot returns and accesses the image again at urls like
www.myapp.com/rails/active_storage/representations/somelonghash
Is there a way to produce a durable link for the plot that does not expire when the graph/plot is deleted and then recalculated. Failing this, is there a way to block bots from accessing these plots.
Note that I currently have a catchall at the end of the routes.rb file:
get '*all', to: 'application#route_not_found', constraints: lambda { |req|
req.path.exclude? 'rails/active_storage'
} if Rails.env.production?
The exclusion of active storage in the catchall is in response to this issue. It is tempting to remove the active_storage exemption, but this might then stop proper active_storage routes.
Maybe I can put something in rack_rewrite.rb to fix this?
Interesting question.
A simple solution would be to use the send_data functionality to send the image directly. However, that can have it's own issues, mostly in terms of probably increasing server bandwidth usage (and reducing server performance). However, a solution like that is what you need if you don't want to go through the trouble of the below as far as creating a redirect model goes and the logic around that.
Original Answer
The redirect will require setting up some sort of Redirects::Graph model. That basically can verify that a graph was deleted and redirect to the new graph instead of the requested one. It would have two fields, a old_signed_id (biglonghash) and a new_signed_id.
Every time you delete
We'll need to populate the redirects model and also add a new entry every time a new graph is created (we should be able to generate the signed_id from the blob somehow).
For performance, and to avoid tons of redirects in a row which may result in a different error/issue. You'll have to manage the request change. IE: Say you now have a redirect A => B, you delete B replacing it with C now you need a A => C and B => C (to avoid an A => B => C redirect), this chain could get rather long indeed. This can be handled efficiently by just adding the new signed_id => new_id index and doing a Redirects::Graph.where(new_signed_id: old_signed_id).update_all(new_signed_id: new_signed_id) to update all the relevant old redirects, whenever you re-generate the graph.
The controller itself is trickier, cleanest method I can think of is to monkey patch the ActiveStorage::RepresentationsController to add a before_action that does something like this (may not work as-is, the params[:signed_id] and the representations path may not be right):
before_action :redirect_if_needed
def redirect_if_needed
redirect_model = Redirects::Graph.find_by(old_signed_id: params[:signed_id])
redirect_to rails_activestorage_representations_path(
signed_id: redirect_model.new_signed_id
) if redirect_model.present?
end
If you have version control setup for your database (IE: Papertrail gem or something) you may be able to work out the old_signed_id and new_signed_id with a bit of work and build the redirects for the urls currently causing errors. Otherwise, sadly this approach will only prevent future errors, and it may be impossible to get the current broken urls working.
Ideally, though you would update the blob itself to use the new graph instead of the old graph rather than deleting, but not sure that's possible/practical.
have you tried?
In the file /robots.txt put:
User-agent: *
Disallow: /rails/active_storage*

Save a response from API call to use in a test so I don't have to continuously repeat requests to API

API requests take too long and are costing me money in my Rails integration tests and my application.
I would like to save API responses and then use that data for testing. Are there any good ways to make that happen?
Also, how can I make fewer api calls in production/development? What kind of caching can I use?
If I understand correctly, your rails app is using an external api, like a google/fb/twitter api, this kind of stuff
Caching the views won't work, because it only caches the template, so it doesn't waste time rendering the view again, and it validates that the cache is warm by hashing the data, which the code will still hit the api to verify that the hashes still match
For you the best way is to use a class that does all the api calls, and cache them in rails cache and give that cache a timeout period, because you don't want your cache to be too stale, but in the same time you will sacrifice some accuracy for some money ( like only do a single call every 5, 15, 30 mins, which ever you pick )
Here's a sample of what I have in mind, but you should modify it to match your needs
module ApiWrapper
class << self
def some_method(some_key) # if keys are needed, like an id or something
Rails.cache.fetch("some_method/#{some_key}", expires_in: 5.minutes) do
# assuming ApiLibrary is the external library handler
ApiLibrary.call_external_library(some_key)
end
end
end
end
Then in your code, call that wrapper, it will only contact the external api if the stored value in the cache is expired.
The call will be something like this
# assuming 5 is the id or value you want to fetch from the api
ApiWrapper.some_method(5)
You can read more about caching methods from the rails guide for caching
Update:
I just thought of another way, for your testing (like rspec tests) you could stub the api calls, and this way you'll save the whole api call, unless you are testing the api it self, using to the same api library I wrote above, we can stub the ApiLibrary it self
allow(ApiLibrary).to receive(:some_method).and_return({ data: 'some fake data' })
PS: the hash key data is part of the return, it's the whole hash not just the string.
There is a great gem for this called VCR. It allows you to make a single request and keep response cached, so every time you run the test you will use this saved response.
I would use http://redis.io/ in conjunction with something like jbuilder. So as an example your view would look like:
json.cache! ["cache", "plans_index"] do
json.array! #plans do |plan|
json.partial! plan
end
end
for this controller:
def index
#plans = Plan.all
end
If you have something that is a show page you can cache it like this:
json.cache! ["cache", "plan_#{params["id"]}"] do
json.extract! #plan, :short_description, :long_description,
end

Caching/Etag for Static Action in Rails 4

Since Rails 4 removed page caching and action caching, I'm wondering what is the Rails 4 way to cache an action that has no variables and has only html in the view? Should I fragment cache the static html in the view? How do I set an etag/fresh_when when there is no model to set it to expire with? I'm struggling to find an example or convention to caching what should be the easiest page to cache.
One note is that while the view is completely static, the page still has a dynamic navbar depending on whether the user is signed in or not. How would you handle a static page like this without resorting to action caching since its been removed and the convention has been set not to use the gem version?
Example:
class HomesController < ApplicationController
def index
end
end
homes/index.html.erb
<div>A bunch of normal html tags with no erb</div>
Edit:
Based on #severin's answer and my own research, here is what I have come up with so far.
class HomesController < ApplicationController
def index
fresh_when(["some-identifier", current_user, flash])
end
end
In addition, I'm using https://github.com/n8/bust_rails_etags to reset all etags after a deploy because the view may have changed between deploys. I think this covers the etag fairly well although I'm still curious whether fresh when will include some identifier about the view automatically and whether "some-idnetifier" is necessary? Is it going to be a problem that sometimes current_user and flash will be nil?
Now on the second point of fragment caching the static content. I'm assuming if I did:
cache "v1" do
all my html
end
I'd have to remember to always change the cache identifier when the page is changed otherwise my app would serve stale content. Any way to automate this as well or is already handled by rails? It would be nice to just cache the last time the view was updated or something clever so I don't have to keep track on when my static content is changed.
You can set an etag/last modified at data without a model, check the documentation: http://api.rubyonrails.org/classes/ActionController/ConditionalGet.html#method-i-fresh_when
So you could do something like:
def index
fresh_when(:etag => 'some_made_up_etag', :last_modified => a_long_time.ago, :public => true)
render
end
Note: you don't need to provide an etag AND a last modified at timestamp, you could just provide an etag or only a last modified at timestamp.
In addition to this, I would also fragment cache the whole content of the view.
Or you could just continue using action-/page_caching using the official plugin/gem: https://github.com/rails/actionpack-page_caching
Some additions regarding the second part of your question:
Rails adds the content of the RAILS_CACHE_ID environment variable to all its cache keys (the etag and the fragment cache key in your example). The bust_rails_etags gem adds another environment variable that affects only the etags... So in your case you could just remove the bust_rails_etags gem and update the RAILS_CACHE_ID environment variable on all your deploys.
You can even automate the updating of the RAILS_CACHE_ID environment variable by adding something like this in config/environment.rb:
code_revision = # ... some piece of code that gets the current revision.
# I'm using git and I use the following (crude) piece of
# to get the current revision:
# code_revision = `git log --pretty=format:%h -n1`.strip
ENV['RAILS_CACHE_ID'] = code_revision
This way, the current code revision is always added to all cache keys.

Rails caching: Expiring multiple pages for one action

I've set up action caching (with sweepers, but I guess that's irrelevant here) in my app, and so far it works great except for one thing:
I use Kaminari for pagination, and thus when I execute expire_action on my action it only expires the first page. As I know caching won't work when using the query string for specifying the page, I've set up a route so the pages are appended to the end of the url (for example /people/123/page/2).
I'll add more info to this post if necessary, but I'm guessing there is something obvious I'm missing here, so: Anyone know how to expire the rest of my pages?
I'm still interested in an answer to my original question and will change my accepted answer should a solution come up. That said, I ended up only caching the original page by checking if a page was specified at all:
caches_action :index, :if => Proc.new { params[:page].nil? }
Here is a solution I've thought of, facing the same problem, though haven't implemented it yet. Cache the actual expiry time in its own key. The key would be a canonical representation of the search URL, ie without the "page" parameter. e.g.:
User searches on http://example.com?q=foo&page=3, so params is { q: 'foo', page: 3 }. Strip out "page=3" and we're left with { q: 'foo' }.
Run to_param on it and add some prefix, and we're left with a cache key like search_expiry_q=foo.
Look up cache for this canonical query, ie Rails.cache.read(search_expiry_q=foo). If it exists, we'll make our result expire at this time. Unfortunately, we only have expires_in, not expires_at, so we'll have to do a calculation. i.e. something like expires_in: expiry_time - Time.now - 5.seconds (the 5 seconds hopefully prevents any race conditions). We cache the full URL/params this way.
OTOH if there's no expiry, then no-one's performed the search recently. So we do:
expiry_time = Time.now + 1.hour
Rails.cache.write(`search_expiry_q=foo`, expiry_time, expires_in: 1.hour)
And cache this fragment/page, again with full URL/params, and expires_in: 1.hour.

How to render a view normally after using render_to_string?

In my Rails application I have an action which creates a XML document using an XML Builder template (rxml) template and render_to_string. The XML document is forwarded to a backend server.
After creating the XML document I want to send a normal HTML response to the browser, but somehow Rails is remembering the first call to render_to_string.
For example:
Rails cannot find the default view show.html.erb because it looks for a show.rxml.
Simply putting a render 'mycontroller/show.html.erb' at the bottom of my action handler makes Rails find the template, but the browser doesn't work because the response header's content type is text/xml.
Is there any way to use render_to_string without "tainting" the actual browser response?
EDIT: It seems that in Rails 2 erase_render_results would do the trick, but in Rails 3 it is no longer available.
The pragmatic answer is that using a view file and two calls to render is Not The Rails Way: views are generally something that is sent to the client, and ActionPack is engineered to work that way.
That said, there's an easy way to achieve what you're trying to do. Rather than using ActionView, you could use Builder::XmlMarkup directly to generate your XML as a string:
def action_in_controller
buffer = ""
xml = Builder::XmlMarkup.new(buffer)
# build your XML - essentially copy your view.xml.builder file here
xml.element("value")
xml.element("value")
# send the contents of buffer to your 3rd server
# allow your controller to render your view normally
end
Have a look at the Builder documentation to see how it works.
The other feature of Builder that you can take advantage of is the fact that XML content is appended to the buffer using <<, so any IO stream can be used. Depending how you're sending content to the other server, you could wrap it all up quite nicely.
Of course, this could end up very messy and long, which is why you'd want to encapsulate this bit of functionality in another class, or as a method in your model.
Seems as if this may be a bug in rails 3 (at least compared to the behavior of 2.3.x render_to_string). In the source for 2.3.8 they clearly take extra steps to reset content_type and set the response body to nil (among other things).
def render_to_string
...
ensure
response.content_type = nil
erase_render_results
reset_variables_added_to_assigns
end
but in the 3.0.3 source for AbstractController::Rendering
def render_to_string(*args, &block)
options = _normalize_args(*args, &block)
_normalize_options(options)
render_to_body(options)
end
You can see there is no explicit resetting of variables, render_to_body just returns view_context.render. It is possible that content-type, response_body, etc are handled elsewhere and this is a red herring, but my first instinct would be to set
response.headers['Content-Type'] = 'text/html'
after your render_to_string before actually rendering.
In migrating the actionwebservice gem I encountered the same error. In their code they circumvent the double render exception by calling the function erase_render_results.
This function is no longer available in rails3. Luckily the fix is quite easy (but it took me a while to find).
Inside actionwebservice the following function was called inside a controller to allow a second render:
def reset_invocation_response
erase_render_results
response.instance_variable_set :#header, Rack::Utils::HeaderHash.new(::ActionController::Response::DEFAULT_HEADERS.merge("cookie" => []))
end
To make this work in rails3, you just have to write:
def reset_invocation_response
self.instance_variable_set(:#_response_body, nil)
response.instance_variable_set :#header, Rack::Utils::HeaderHash.new("cookie" => [], 'Content-Type' => 'text/html')
end
Hope this helps.

Resources