Proper Caching Strategy for a Dynamic Action - ruby-on-rails

Probably an obvious question for those of you who have scaled/cached anything before. I haven't, and I'm getting lost in the tutorials and code snippets all over the internet (http://guides.rubyonrails.org/caching_with_rails.html).
I'm deploying to Heroku with Memcached installed and am figuring out the most optimized way to do the following:
Query the database to find a post and see if it has been 'flagged'
Query a Whitelist to see if a different part of the post has been 'flagged'
Query an API to see if they find this user in their system
Render a page with a lot of repetitive calls to remote systems for CSS/JS/etc.
I assume #1 happens frequently and changes often. #2 less so. #3 changes infrequently (months), and #4 should only change if #3 changes.
I want to be able to increment flag_count and view_count regularly without hitting a cached version. What mix of page, action and fragment caching should I be doing? Right now, I'm not caching this action at all...
My [simplified] controller code:
def show
expires_in 12.hours, :public => true
#post = Post.find(params[:id])
#CHECK FLAG STATUS
redirect_to root_path and return if #post.flag?
#CHECK WHITELIST STATUS
redirect_to root_path and return if Whitelist.includes?(#post.screen_name)
#Ping API again on the off chance user deleted/changed account
if #post && #post.user = get_user_from_api( #post.screen_name )
#post.increment_views!
render :layout => false
else
redirect_to root_path
end
end

There's a few small things that might work with this. Tricky given there's no way to avoid hitting the app stack each request.
Fragment Caching
Use fragment caching with memcache to avoid regenerating post/comment content. There might be some gains here if your views are heavy. Memcache objects self-expire and can be keyed to the most recent version of a post with something like:
<% cache #post.cache_key do %>
<%= #post.formatted_content %>
...
<% end %>
Other things to consider
Setup server to send cache headers with images, JS & CSS
Check out barista to reduce requests & bundle assets

Related

How do I add page caching to my Rails app for visitors?

Say I have a Rails 4.2 site on Heroku with many Posts. When a post is viewed, the application hits the database to get the post's data and associated data from other models. Visitors (non-logged-in users) to my website should see Posts without any customizations, so displaying these pages should not require accessing the database. How can I set up simple page or action caching (or an equivalent) for all visitors to my site, so the database is skipped? Could I also let a CDN take over rendering these pages?
For starters, take a look at the Rails caching guide. Both page and action caching have been removed from Rails.
You can try fragment caching, which will grab the generated HTML of the post and store it away, though you'll need a conditional based on whether or not there is a current_user.
A super-simple version might look like:
app/views/posts/show.html.erb
<% cache_unless (current_user, #post) do %>
<%= #post.body %>
<% end %>
The cache method is going to be a read action if a cache exists for that post, and it will be a generate and write action if that cache doesn't exist.
Caching with an ActiveRecord object creates a key that includes updated_at by default, so if a post changes, the first visitor that loads the post will have a longer wait time, and every subsequent visitor will get the benefits of caching.
As for your other question, there are many other Ruby-based tools for serving static pages, and a review of all those is beyond my knowledge, and probably beyond the scope of this question. :)
Lanny Bose's answer really looks the best, but I'll add this link:
https://devcenter.heroku.com/articles/http-caching-ruby-rails#public-requests
The whole page is useful, but in particular it talks about marking requests as public or private. They are private by default, but if you mark it as public it allows pages to be served by an intermediary proxy cache.
The linked page talks about Rack::Cache + memcache as a proxy cache; I'm not sure if this still exists in rails 4.
This second heroku page talks about action caching, which could cache a specific action for you:
https://devcenter.heroku.com/articles/caching-strategies#action-caching
You don't really want caching here. You just want a conditional in both controller and view, like
# controller
if logged_in?
#post = Post.find(params[:id])
# load more records
end
# view
<% if #post %>
<%= #post.title %>
...
<% else %>
You're not logged in, blah blah.
<% end %>
.. assuming the controller is more than just loading a post record. I don't necessarily agree with conditionally loading the record as above, but your use case may require it.
Alternatively, if you DO require caching for performance reasons, you can just elegantly include the current user id in the fragment cache key like
<% cache "post-#{#post.id}-#{current_user ? current_user.id : 0}" do %>
<% if logged_in? %>
... # expensive operations
<% else %>
Please login to see more..
<% end %>
<% end %>
so that the cache works with the same code for users or anons.

create edit URL route that doesn't show :id

I'm in the midst of trying to clean up my routing. I have a company model that can log in and create applications. They can create several.
Currently this is my setup:
Routes
get 'applications/edit/:id', to: 'applications#edit'
Applications_controller
def edit
#application = current_company.applications.find(params[:id])
end
def update
#application = Application.find(params[:id])
if #application.update(application_params)
redirect_to dashboard_path
else
render 'edit'
end
end
Each company have their own dashboard. Here's my code from
/dashboard
Your active applications
<% #applications.all.each do |f| %>
<%= link_to "Application", show_path + "/#{f.id}" %> | <%= link_to "Edit", edit_application_path("#{f.id}") %>
<br>
<% end %>
Now this all works, if I go to edit_application/11 f.ex I see it.
The thing I'd like changed is to remove the :id from the URL.
Thus make it more secure and to give a nicer feel. Now it took me 5 minutes before I realised I could just change the :id url and edit everything. Thus I added the current_company.applications to stop that. Yet I don't feel like this is very secure.
If you want to remove the :id, you'll still need a way to find the data you want.
As long as you have the url /edit/12 and as long as you use the id 12 in the GET url to find your content, it will show in the browser bar. The only way to "hide" it (but it's not more secure at all, because it's easily found out), is to use a POST request with a form containing a hidden field with the id (can be made in JavaScript).
You are asking the application to get the id from the link in the #applications.all.each but the only way it can do that is to include it somewhere in the request (be it GET, POST, COOKIES/SESSION, ...).
For another (possibly better) solution, read on.
A very common practice is to use slugs: you create a unique key for each content, for example, if your title is "My great app", the slug will be my-great-app. Thus there is no id in your URL (and it cannot be found out if you always use slugs as references). The advantage is that you'll still find a quick match for what you're searching for (creating an unique index on the slugs).
Some further reading about slugs:
http://rubysnippets.com/2013/02/04/rails-seo-pretty-urls-in-rails/
What is the etymology of 'slug'?

button to save current page in rails 3.2

I need to have a button to save the current web site (just like clicking on "Save as"), I created a method in the controller which works great for any external site (like http://www.google.com) but doesn't work for the sites inside my application, I get a timeout error!. This has no explanation to me :(
Any clue what is the issue?
#CONTROLLER FILE
def save_current_page
# => Using MECHANIZE
agent = Mechanize.new
page = agent.get request.referer
send_data(page.content, :filename => "filename.txt")
end
I tried also Open URI, same problem!
#CONTROLLER FILE
def save_current_page
# => USANDO OPEN URI
send_data(open(request.referer).read, :filename => "filename.txt")
end
I'm using rails 3.2 and ruby 1.9, any help is appreciated, I already spent like 10 hours trying to make it work!!
Rails can only handle one request at a time. It's a never-ending standoff between the two requests - the first request is waiting for the second request, but the second request is waiting for the first request, and therefore you get a Timeout error. Even if you're running multiple instances of the app with Passenger or something, it's a bad idea.
The only way I can think to get around it would be to use conditional statements like so:
referer = URI.parse(request.referer)
if Rails.application.config.default_url_options[:host] == referer.host
content = "via yoursite.com"
else
agent = Mechanize.new
page = agent.get request.referer
content = page.content
end
send_data content, filename: "filename.txt"
A little dirty but it should get around the Timeout problem. As far a getting the actual content of a page from your own site - that's up to you. You could either render the template, grab something from cache, or just ignore it.
A much better solution would be to enqueue this code into something like Resque or Delayed Job. Then the queue could make the request and wait in line to request the page like normal. It would also mean that the user wouldn't have to wait while your application make a remote request, which is dangerous because who knows how long the page will take to respond.
After several hours and lots of other posts I got to a final solution:
Bricker is right in that it is not possible for rails to render more than once in a call, as taken from http://guides.rubyonrails.org/layouts_and_rendering.html "Can only render or redirect once per action"
The site also states "The rule is that if you do not explicitly render something at the end of a controller action, Rails will automatically look for the action_name.html.erb template in the controller’s view path and render it."
Then, the solution that worked great for me was to tell the controller to render to a string if a download flag (download=true) was set in :params (I also use request.url to have it working from any view in my application)
View:
= link_to 'Download', request.url+"&downloadexcel=true", :class => 'btn btn-primary btn-block'
Controller:
def acontrolleraction
#some controller code here
if params[:downloadexcel]
save_page_xls
else
# render normally
end
end
def save_page_xls
#TRESCLOUD - we create a proper name for the file
path = URI(request.referer).path.gsub(/[^0-9a-z]/i, '')
query = URI(request.referer).query.gsub(/[^0-9a-z]/i, '')
filename = #project_data['NOMBRE']+"_"+path+"_"+query+".xls"
#TRESCLOUD - we render the page into a variable and process it
page = render_to_string
#TRESCLOUD - we send the file for download!
send_data(page, :filename => filename, :type => "application/xls")
end
Thanks for your tips!

How can I cache a page manually in RoR?

I am trying to create a site in RoR and have enabled caching for some pages and actions. The related DB may not accessible every time and hence using the cache is very much required. Hence I cant wait for someone to actually visit the page, render it and then cache it. Instead I want whatever is cache-able to be cached manually, programatically. Is it actually possible or is it that caching is completely automatic in RoR?
The lazy* solution would be to visit the page as part of your deployment process with lynx, or even curl. That would trigger the cache event from the outside, but at a time of your choosing.
(*) lazy in a good way, I hope.
Check out this page_cache plugin. Seems like this is what you need.
I am doing manual caching triggering now, and looks like you can use built-in API of actionpack-page_caching plugin to manually trigger the creating of pages cache. You need to use cache_page(content, path, extension = nil, gzip = Zlib::BEST_COMPRESSION) function with attributes (look line 80 at https://github.com/rails/actionpack-page_caching/blob/master/lib/action_controller/caching/pages.rb). Here I made sample action, which is iterating over some collection and making cache of "show" method of each item of this collection.
def precompile
#pages = Page.all
#pages.each do |page|
#page = page
cache_page(render_to_string(template: 'pages/show'), url_for(action: :show, id: #page, only_path: true))
end
redirect_to '/'
end
The url_for(action: :show, id: #page, only_path: true) part of my code is not very clean, but it is the first version of code which is working as I needed, any refactor are welcome.
Also, this code will overwrite the cache file every time it is fired, without checking for any changes or expirations.
Ref :- Link
class ProductsController < ActionController
caches_page :index
def index
end
end
set perform caching to true in your enviorment config/environments/development.rb
config.action_controller.perform_caching = true

Best way to combine fragment and object caching for memcached and Rails

Lets say you have a fragment of the page which displays the most recent posts, and you expire it in 30 minutes. I'm using Rails here.
<% cache("recent_posts", :expires_in => 30.minutes) do %>
...
<% end %>
Obviously you don't need to do the database lookup to get the most recent posts if the fragment exists, so you should be able to avoid that overhead too.
What I'm doing now is something like this in the controller which seems to work:
unless Rails.cache.exist? "views/recent_posts"
#posts = Post.find(:all, :limit=>20, :order=>"updated_at DESC")
end
Is this the best way? Is it safe?
One thing I don't understand is why the key is "recent_posts" for the fragment and "views/recent_posts" when checking later, but I came up with this after watching memcached -vv to see what it was using. Also, I don't like the duplication of manually entering "recent_posts", it would be better to keep that in one place.
Ideas?
Evan Weaver's Interlock Plugin solves this problem.
You can also implement something like this yourself easily if you need different behavior, such as more fine grained control. The basic idea is to wrap your controller code in a block that is only actually executed if the view needs that data:
# in FooController#show
#foo_finder = lambda{ Foo.find_slow_stuff }
# in foo/show.html.erb
cache 'foo_slow_stuff' do
#foo_finder.call.each do
...
end
end
If you're familiar with the basics of ruby meta programming it's easy enough to wrap this up in a cleaner API of your taste.
This is superior to putting the finder code directly in the view:
keeps the finder code where developers expect it by convention
keeps the view ignorant of the model name/method, allowing more view reuse
I think cache_fu might have similar functionality in one of it's versions/forks, but can't recall specifically.
The advantage you get from memcached is directly related to your cache hit rate. Take care not to waste your cache capacity and cause unnecessary misses by caching the same content multiple times. For example, don't cache a set of record objects as well as their html fragment at the same time. Generally fragment caching will offer the best performance, but it really depends on the specifics of your application.
What happens if the cache expires between the time you check for it in the controller
and the time it's beeing checked in the view rendering?
I'd make a new method in the model:
class Post
def self.recent(count)
find(:all, :limit=> count, :order=>"updated_at DESC")
end
end
then use that in the view:
<% cache("recent_posts", :expires_in => 30.minutes) do %>
<% Post.recent(20).each do |post| %>
...
<% end %>
<% end %>
For clarity, you could also consider moving the rendering of a recent post into its own partial:
<% cache("recent_posts", :expires_in => 30.minutes) do %>
<%= render :partial => "recent_post", :collection => Post.recent(20) %>
<% end %>
You may also want to look into
Fragment Cache Docs
Which allow you to do this:
<% cache("recent_posts", :expires_in => 30.minutes) do %>
...
<% end %>
Controller
unless fragment_exist?("recent_posts")
#posts = Post.find(:all, :limit=>20, :order=>"updated_at DESC")
end
Although I admit the issue of DRY still rears its head needing the name of the key in two places. I usually do this similar to how Lars suggested but it really depends on taste. Other developers I know stick with checking fragment exist.
Update:
If you look at the fragment docs, you can see how it gets rid of needing the view prefix:
# File vendor/rails/actionpack/lib/action_controller/caching/fragments.rb, line 33
def fragment_cache_key(key)
ActiveSupport::Cache.expand_cache_key(key.is_a?(Hash) ? url_for(key).split("://").last : key, :views)
end
Lars makes a really good point about there being a slight chance of failure using:
unless fragment_exist?("recent_posts")
because there is a gap between when you check the cache and when you use the cache.
The plugin that jason mentions (Interlock) handles this very gracefully by assuming that if you are checking for existence of the fragment, then you will probably also use the fragment and thus caches the content locally. I use Interlock for these very reasons.
just as a piece of thought:
in application controller define
def when_fragment_expired( name, time_options = nil )
# idea of avoiding race conditions
# downside: needs 2 cache lookups
# in view we actually cache indefinetely
# but we expire with a 2nd fragment in the controller which is expired time based
return if ActionController::Base.cache_store.exist?( 'fragments/' + name ) && ActionController::Base.cache_store.exist?( fragment_cache_key( name ) )
# the time_fraqgment_cache uses different time options
time_options = time_options - Time.now if time_options.is_a?( Time )
# set an artificial fragment which expires after given time
ActionController::Base.cache_store.write("fragments/" + name, 1, :expires_in => time_options )
ActionController::Base.cache_store.delete( "views/"+name )
yield
end
then in any action use
def index
when_fragment_expired "cache_key", 5.minutes
#object = YourObject.expensive_operations
end
end
in view
cache "cache_key" do
view_code
end

Resources