I'm trying to figure out the best way to set organize the caching system for my scenario:
My web app has "trending movies," which are basically like twitter's trending topics -- the popular topics of conversation. I've written the function Movie.trending, which returns an array of 5 Movie objects. However, since calculating the trending movies is fairly CPU intensive and it will be shown on every page, I want to cache the result and let it expire after 5 minutes. Ideally, I'd like to be able to call Movie.trending from anywhere in the code and assume that cachine will work how i expect it to -- if the results are 5 minutes or earlier, then renew the results, otherwise, serve the cached results.
Is fragment caching the right selection for a task like this? Are there any additional gems I ought to be using? I'm not using Heroku.
Thanks!
To reach this model's caching you can try Rails.cache.fetch, see the example below:
# model - app/models/movie.rb
class Movie
def self.trending
Rails.cache.fetch("trending_movies", :expires_in => 5.minutes) do
# CPU intensive operations
end
end
end
# helper - app/views/helpers/application.rb
module ApplicationHelper
def trending_movies
content_tag :div do
Movie.trending
end
end
end
# view - app/views/shared/_trending_movies
trending_movies
To test it in development mode don't forget to turn on caching for a specific environment
Related
Currently I am building an application that allows users to place bids on products and admins to approve them. The 'transactions' themselves take place outside of the scope of the application. Currently, users see the price of an asset on the transaction/new page and submit a bid by submitting the form. Admins click a button to approve the bid.
class TransactionsController < ApplicationController
before_action :get_price
def new
#price = get_price
#tansaction = Transaction.new
end
###
def get_price
#price = <<Some External Query>>
end
def approve
t = Transaction.find(params[:id])
t.status = "Approved"
t.update!
end
Obviously this is not ideal. I don't want to query the API every time a user wants to submit a bid. Ideally, I could query this API every 5-10 seconds in the background and use the price in that manner. I have looked at a couple of techniques for running background jobs including delayed_job, sidekiq, and resque. For example in sidekiq I could run something like this:
#app/workers/price_worker.rb
class PriceWorker
include Sidekiq::Worker
def perform(*args)
get_price
end
def get_price
#price = <<Some External Query>>
end
end
#config/initializers/sidekiq.rb
schedule_file = "config/schedule.yml"
if File.exists?(schedule_file) && Sidekiq.server?
Sidekiq::Cron::Job.load_from_hash YAML.load_file(schedule_file)
end
#config/schedule.yml
my_price_job:
cron: "*/10 * * * * * "
class: "PriceWorker"
That code runs. The problem is I am kind of lost on how to handle the price variable and pass it back to the user from the worker. I have watched the Railscasts episodes for both sidekiq and resque. I have written background workers and jobs that queue and run properly, but I cannot figure out how to implement them into my application. This is the first time I have dealt with background jobs so I have a lot to learn. I have spent sometime researching this issue and it seems like more background jobs are used for longer running tasks like updating db indexes rather than constantly recurring jobs (like an API request every 5 seconds).
So to sum up, What is the proper technique for running a constantly recurring task such as querying an external API in Rails? Any feedback on how to do this properly will be greatly appreciated! Thank you.
That is not how background jobs work. You're right, you have a lot of reading up to do. Think of running an asynchronous job processor like Sidekiq as running an entirely separate app. It shares the same code base as your Rails app but it runs completely separately. If you want these two separate apps to talk to each other then you have to design and write that code.
For example, I would define a cache with reader and writer methods, then have the cache populated when necessary:
someone loads product "foo" for the first time on your site
Rails checks the cache and finds it empty
Rails calls the external service
Rails saves the external service response to the cache using its writer method
Rails returns the cached response to the client
The cache would be populated thereafter by Sidekiq:
someone loads product "foo" for the second time on your site
Rails checks the cache and finds the cached value from above
Rails fires a Sidekiq job telling it to refresh the cache
Rails returns the cached response to the client
Continuing from step 3 above:
Sidekiq checks to see when the cache was last refreshed, and if it was more than x seconds ago then continue, else quit
Sidekiq calls the external service
Sidekiq saves the external service response to the cache using its writer method
When the next client loads product "foo", Rails will read the cache that was updated (or not updated) by Sidekiq.
With this type of system, the cache must be an external store of some kind like a relational database (MySQL, Postgres, Sqlite) or a NoSQL database (Redis, memcache). You cannot use the internal Rails cache because the Rails cache exists only within the memory space of the Rails app, and is not readable by Sidekiq. (because Sidekiq runs as a totally separate app)
I guess in this case you should use rails cache. Put something like this in your controller:
#price = Rails.cache.fetch('price') do
<<Some external query>>
end
you can also configure cache expiration date, by setting expires_in argument, see https://apidock.com/rails/ActiveSupport/Cache/Store/fetch for more information.
Regarding using background jobs to update your "price" value, you would need to store retrieved data anyways (use some kind of database) and fetch it in your controller.
Assume you want to do some low level caching in Rails (with memcached, for example) and that you'd like to have just 1 call somewhere in your app, like...
Rails.cache.fetch('books', expires_in: 1.day) do
Book.offset(offset)
.limit(limit)
.select('title, author, number_of_pages')
.all
end
...to warm up your cache when you boot your app, so you can just use a simple call like...
Rails.cache.read('books')
...anywhere and multiple times throughout your app (in views, controllers, helpers...) to access this "books" collection.
Where should one put the initial "fetch" call to make it work?
After your comment I want to clear up a couple of things.
You should always be using fetch if you require a result to come back. Wrap the call in a class method inside Book for easy access:
class Book
def self.cached_books
Rails.cache.fetch < ... >
end
end
You can have a different method forcing the cache to be recreated:
def self.write_book_cache
Rails.cache.write < ... >
end
end
Then in your initializer, or in a rake task, you can just do:
Book.write_book_cache
This seems more maintainable to me, while keeping the succinct call to the cache in the rest of your code.
My first thought would be to put it in an initializer - probably one specifically for the purpose (/config/initializers/init_cache.rb, or something similar).
It should be executed automatically (by virtue of being in the initializers folder) when the app starts up.
I've got a Document model basically, which has_many Pages, and I have a view where I need to enumerate a bunch of documents (eg. 300) and make a button for each page. (I want to do pagination client side using the DataTables jQuery plugin so that the table can be sortable and searchable). The problem is that if I try to enumerate all the buttons for each Page in each Document, it takes over 10 seconds to render which is just not useful.
Is there any 'trick' to doing this kind of nested collection rendering fast? Should I just cache the fragments for each document (they don't change much once they're created)? Or is this just a bad situation for Rails partials and would my best bet be to do some client side rendering as part of the pagination in DataTables?
Edit: I'm already including associations so that I don't have an N+1 query problem, that's not the issue. And I tried caching, and it seems like that might be my solution for now because this index page gets reloaded often between every few documents being added, so it never has to rebuild the full cache of all the partials.
The immediate need for caching seems like a code smell. In lieu of guessing, have you tried profiling? (e.g. http://hiltmon.com/blog/2012/02/27/quick-and-dirty-rails-performance-profiling/)
gem 'ruby-prof'
Define a profiler:
# /lib/development_profiler.rb
class DevelopmentProfiler
def self.prof(file_name)
RubyProf.start
yield
results = RubyProf.stop
# Print a flat profile to text
File.open "#{Rails.root}/tmp/performance/#{file_name}-graph.html", 'w' do |file|
RubyProf::GraphHtmlPrinter.new(results).print(file)
end
File.open "#{Rails.root}/tmp/performance/#{file_name}-flat.txt", 'w' do |file|
# RubyProf::FlatPrinter.new(results).print(file)
RubyProf::FlatPrinterWithLineNumbers.new(results).print(file)
end
File.open "#{Rails.root}/tmp/performance/#{file_name}-stack.html", 'w' do |file|
RubyProf::CallStackPrinter.new(results).print(file)
end
end
end
Wrap your view-logic with...
DevelopmentProfiler::prof("render-profiling") do
# Your slow code here
end
Edit - An additional thought: Because your model's data is not likely to change, it may be better to eat the rendering cost, once. You could statically-generate the whole rendered page in an after_save callback. Then just serve that single file for subsequent requests.
Though if charging a more traditional cache isn't a huge inconvenience, this may be more trouble than it's worth.
What I'm doing
I'm using the twitter gem (a Ruby wrapper for the Twitter API) in my app, which is run on Heroku. I use Heroku's Scheduler to periodically run caching tasks that use the twitter gem to, for example, update the list of retweets for a particular user. I'm also using delayed_job so scheduler calls a rake task, which calls a method that is 'delayed' (see scheduler.rake below). The method loops through "authentications" (for users who have authenticated twitter through my app) to update each authorized user's retweet cache in the app.
My question
What am I doing wrong? For example, since I'm using Heroku's Scheduler, is delayed_job redundant? Also, you can see I'm not catching (rescuing) any errors. So, if Twitter is unreachable, or if a user's auth token has expired, everything chokes. This is obviously dumb and terrible because if there's an error, the entire thing chokes and ends up creating a failed delayed_job, which causes ripple effects for my app. I can see this is bad, but I'm not sure what the best solution is. How/where should I be catching errors?
I'll put all my code (from the scheduler down to the method being called) for one of my cache methods. I'm really just hoping for a bulleted list (and maybe some code or pseudo-code) berating me for poor coding practice and telling me where I can improve things.
I have seen this SO question, which helps me a little with the begin/rescue block, but I could use more guidance on catching errors, and one the higher-level "is this a good way to do this?" plane.
Code
Heroku Scheduler job:
rake update_retweet_cache
scheduler.rake (in my app)
task :update_retweet_cache => :environment do
Tweet.delay.cache_retweets_for_all_auths
end
Tweet.rb, update_retweet_cache method:
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
end
end
User.rb, twitter method:
def twitter
authentication = Authentication.find_by_user_id_and_provider(self.id, "twitter")
if authentication
#twitter ||= Twitter::Client.new(:oauth_token => authentication.oauth_token, :oauth_token_secret => authentication.oauth_secret)
end
end
Note: As I was posting this, I noticed that I'm finding all "twitter" authentications in the "cache_retweets_for_all_auths" method, then calling the "User.twitter" method, which specifically limits to "twitter" authentications. This is obviously redundant, and I'll fix it.
First what is the exact error you are getting, and what do you want to happen when there is an error?
Edit:
If you just want to catch the errors and log them then the following should work.
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
being
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
rescue => e
#Either create an object where the error is log, or output it to what ever log you wish.
end
end
end
This way when it fails it will keep moving on to the next user but will still making a note of the error. Most of the time with twitter its just better to do something like this then try to do with each error on its own. I have seen so many weird things out of the twitter API, and random errors, that trying to track down every error almost always turns into a wild goose chase, though it is still good to keep track just in case.
Next for when you should use what.
You should use a scheduler when you need something to happen based on time only, delayed jobs for when its based on an user action, but the 'action' you are going to delay would take to long for a normal response. Sometimes you can just put the thing plainly in the controller also.
So in other words
The scheduler will be fine as long as the time between updates X is less then the time it will take for the update to happen, time Y.
If X < Y then you might want to look at calling the logic from the controller when each indvidual entry is accessed, isntead of trying to do them all at once. The idea being you would only update it after a certain time as passed so. You could store the last time update either on the model itself in a field like twitter_udpate_time or in a redis or memecache instance at a unquie key for the user/auth.
But if the individual update itself is still too long, then thats when you should do the above, but instead of doing the actually update, call a delayed job.
You could even set it up that it only updates or calls the delayed job after a certain number of views, to further limit stuff.
Possible Fancy Pants
Or if you want to get really fancy you could still do it as a cron job, but have a point system based on views that weights which entries should be updated. The idea being certain actions would add points to certain users, and if their points are over a certain amount you update them, and then remove their points. That way you could target the ones you think are the most important, or have the most traffic or show up in the most search results etc etc.
Next off a nick picky thing.
http://api.rubyonrails.org/classes/ActiveRecord/Batches.html
You should be using
#authentications.find_each do |authentication|
instead of
#authentications.each do |authentication|
find_each pulls in only 1000 entries at a time so if you end up with a lof of Authentications you don't end up pulling a crazy amount of entries into memory.
What is the best way to profile a controller action in Ruby on Rails. Currently I am using the brute-force method of throwing in puts Time.now calls between what I think will be a bottleneck. But that feels really, really dirty. There has got to be a better way.
I picked up this technique a while back and have found it quite handy.
When it's in place, you can add ?profile=true to any URL that hits a controller. Your action will run as usual, but instead of delivering the rendered page to the browser, it'll send a detailed, nicely formatted ruby-prof page that shows where your action spent its time.
First, add ruby-prof to your Gemfile, probably in the development group:
group :development do
gem "ruby-prof"
end
Then add an around filter to your ApplicationController:
around_action :performance_profile if Rails.env == 'development'
def performance_profile
if params[:profile] && result = RubyProf.profile { yield }
out = StringIO.new
RubyProf::GraphHtmlPrinter.new(result).print out, :min_percent => 0
self.response_body = out.string
else
yield
end
end
Reading the ruby-prof output is a bit of an art, but I'll leave that as an exercise.
Additional note by ScottJShea:
If you want to change the measurement type place this:
RubyProf.measure_mode = RubyProf::GC_TIME #example
Before the if in the profile method of the application controller. You can find a list of the available measurements at the ruby-prof page. As of this writing the memory and allocations data streams seem to be corrupted (see defect).
Use the Benchmark standard library and the various tests available in Rails (unit, functional, integration). Here's an example:
def test_do_something
elapsed_time = Benchmark.realtime do
100.downto(1) do |index|
# do something here
end
end
assert elapsed_time < SOME_LIMIT
end
So here we just do something 100 times, time it via the Benchmark library, and ensure that it took less than SOME_LIMIT amount of time.
You also may find these links useful: The Benchmark.realtime reference and the Test::Unit reference. Also, if you're into the 'book reading' thing, I picked up the idea for the example from Agile Web Development with Rails, which talks all about the different testing types and a little on performance testing.
There's a Railscast on profiling that's well worth watching
http://railscasts.com/episodes/98-request-profiling
You might want to give the FiveRuns TuneUp service a try, as it's really rather impressive. Disclaimer: I'm not associated with FiveRuns in any way, I've just tried this service out.
TuneUp is a free service whereby you download a plugin and when you run your application it injects a panel at the top of the screen that can be expanded to display detailed performance metrics.
It gives you some nice graphs, including one that shows what proportion of time is spent in the Model, View and Controller. You can even drill right down to see the individual SQL queries that ActiveRecord is executing if you need to and it can show you the underlying database schema with another click.
Finally, you can optionally upload your profiling data to the FiveRuns site for community performance analysis and advice.
This works in Rails 4.2.6:
o=OpenStruct.new(logger: Rails.logger)
o.extend ActiveSupport::Benchmarkable
o.benchmark 'name' do
# ... your code ...
end