In my app there is a financial overview page with quite a lot of queries. This page is refreshed once a month after executing a background job, so I added caching:
#iterated_hours = Rails.cache.fetch("productivity_data", expires_in: 24.hours) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
The cache must expire when the background job finishes, so I created a model CacheExpiration:
class CacheExpiration < ApplicationRecord
validates :cache_key, :expires_in, presence: true
end
So in the background job a record is created:
CacheExpiration.create(cache_key: "productivity_data", expires_in: DateTime.now)
And the Rails.cache.fetch is updated to:
expires_in = get_cache_key_expiration("productivity_data")
#iterated_hours = Rails.cache.fetch("productivity_data", expires_in: expires_in) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
private def get_cache_key_expiration(cache_key)
cache_expiration = CacheExpiration.find_by_cache_key(cache_key)
if cache_expiration.present?
cache_expiration.expires_in
else
24.hours
end
end
So now the expiration is set to a DateTime, is this correct or should it be a number of seconds? Is this the correct approach to make sure the cache is expired only once when the background job finishes?
Explicitly setting an expires_in value is very limiting and error prone IMO. You will not be able to change the value once a cache value has been created (well you can clear the cache manually) and if ever you want to change the background job to run more/less often, you also have to remember to update the expires_in value. Additionally, the time when the background job is finished might be different from the time the first request to the view is made. As a worst case, the request is made a minute before the background job updates the information for the view. Your users will have to wait a whole day to get current information.
A more flexible approach is to rely on updated_at or in their absence created_at fields of ActiveRecord models.
For that, you can either rely on the CacheExpiration model you already created (it might already have the appropriate fields) or use the last of the "huge number of records" you create. Simply order them and take the last SomeArModel.order(created_at: :desc).first
The benefit of this approach is that whenever the AR model you create is updated/created, you cache is busted and a new one will be created. There is no longer a coupling between the time a user called the end point and the time the background job ran. In case a record is created by any means other than the background job, it will also simply be handled.
ActiveRecord models are first class citizens when it comes to caching. You can simply pass them in as cache keys. Your code would then change to:
Rails.cache.fetch(CacheExpiration.find_by_cache_key("productivity_data")) do
FinancialsIterator.new.create_productivity_iterations(#company)
end
But if at all possible, try to find an alternative model so you no longer have to maintain CacheExpiration.
Rails also has a guide on that topic
Related
I currently have the following simple controller:
class SimpleController < ApplicationController
def index
#results = fetch_results
end
end
fetch_results is a fairly expensive operation so although the above works, I don't want to run it every time the page is refreshed. How can I decouple the updating of #results so that it's updated on a fixed schedule, let's say every 15 minutes.
That way each time the page is loaded it will just return the current #results value, which at worst would be 14 minutes and 59 seconds out of date.
You might want to use Rails` low level caching for this:
def index
#results = Rails.cache.fetch('fetched_results', expires_in: 15.minutes) do
fetch_results
end
end
Read more about how to configure Rails' caching stores in the Rails Guide.
Well I would store this in a database table and update regularly via a background job. The updates will be relative to some events, like if the user has done something that may change the result, then the result will be recalculated and updated.
Another solution is to update the result regularly, say every hour, using cron jobs. There is a good gem that can handle it.
In order to process events asynchronously and create an activity feed, I'm using Sidekiq and Ruby on Rails' Global ID.
This works well for most types of activities, however some of them require data that could change by the time the job is performed.
Here's a completely made-up example:
class Movie < ActiveRecord::Base
include Redis::Objects
value :score # stores an integer in Redis
has_many :likes
def popular?
likes.count > 1000
end
end
And a Sidekiq worker performing a job every time a movie is updated:
class MovieUpdatedWorker
include Sidekiq::Worker
def perform(global_id)
movie = GlobalID::Locator.locate(global_id)
MovieUpdatedActivity.create(movie: movie, score: movie.score) if movie.popular?
end
end
Now, imagine Sidekiq is lagging behind and, before it gets a chance to perform its job, the movie's score is updated in Redis, some users unliked the movie and the popular method now returns false.
Sidekiq ends up working with updated data.
I'm looking for ways to schedule jobs while making sure the required data won't change when the job is performed. A few ideas:
1/ Manually pass in all the required data and adjust the worker accordingly:
MovieUpdatedWorker.perform_async(
movie: self,
score: score,
likes_count: likes.count
)
This could work but would require reimplementing/duplicating all methods that rely on data such as score and popular? (imagine an app with much more than these two/three movable pieces).
This doesn't scale well either since serialized objects could take up a lot of room in Redis.
2/ Stubbing some methods on the record passed in to the worker:
MovieUpdatedWorker.perform_async(
global_id,
stubs: { score: score, popular?: popular? }
)
class MovieUpdatedWorker
include Sidekiq::Worker
def perform(global_id, stubs: {})
movie = GlobalID::Locator.locate(global_id)
# inspired by RSpec
stubs.each do |message, return_value|
movie.stub(message) { return_value }
end
MovieUpdatedActivity.create(movie: movie, score: movie.score) if movie.popular?
end
end
This isn't functional, but you can imagine the convenience of dealing with an actual record, not having to reimplement existing methods, and dealing with the actual data.
Do you see other strategies to "freeze" object data and asynchronously process them? What do you think about these two?
I wouldn't say that the data was stale since you would actually have the newest version of it, just that it was no longer popular. It sounds like you actually want the stale version.
If you don't want the data to have changed you need to cache it somehow. Either like you say, pass the data to the job directly, or You can add some form of versioning of the data in the database and pass a reference to the old version.
I think passing on the data you need to Redis is a reasonable way. You could serialize only the attributes you actually care about, like score.
I have a database that gets updated once per day. Rather than querying it in my controller every time a user makes a request, e.g.:
#rainy = Place.where("forecast = 'rain'")
Can I run that query just once per day, after the DB update, and make the values available in a variable that my controller can access?
You can, but that isn't really a good solution. Rails already has a way of persisting the result of expensive computations across requests: Caching.
Use
#rainy = Rails.cache.fetch(:rainy, expires_in: 1.day) do
Place.where(forecast: 'rain')
end
If you need the value to expire at a specific time each day, you can define a rake task which computes the value, and then expires the cache:
# lib/tasks/rainy.rake
task :compute_rainy
# ... whatever comutation produces your database value ...
# Then, expire the cache for :rainy so that the next
# request loads (and caches) the new value
Rails.cache.delete(:rainy)
end
I am currently working on a rails project, i was asked to save progress of a sidekiq workers and store it, so the user who is using the application can see the progress. Now i am faced with this dilemna, is it better to just write out to a text file or save it in a database.
If it is a database, then how to save it in a model object. I know we can store the progress of workers by just sending out the info to log file.
class YourWorker
include Sidekiq::Worker
def perform
logger.info { "Things are happening." }
logger.debug { "Here's some info: #{hash.inspect}" }
end
So if i want to save the progress of workers in a data model, then how?
Your thread title says that the data is unstructured, but your problem description indicates that the data should be structured. Which is it? Speed is not always the most important consideration, and it doesn't seem to be very important in your case. The most important consideration is how your data will be used. Will the way your data is used in the future change? Usually, a database with an appropriate model is the better answer because it allows flexibility for future requirements. It also allows other clients access to your data.
You can create a Job class and then update some attribute of the currently working job.
class Job < ActiveRecord::Base
# assume that there is a 'status' attribute that is defined as 'text'
end
Then when you queue something to happen you create a new Job and pass the id of the Job to perform or perform_async.
job = Job.create!
YourWorker.perform_async job.id
Then in your worker, you'd receive the id of the job to be worked on, and then retrieve and update that record.
def perform(job_id)
job = Job.find job_id
job.status = "It's happening!"
job.save
end
I want to display a random record from the database for a certain amount of time, after that time it gets refreshed to another random record.
How would I go about that in rails?
Right now I'm looking in the directions of cronjobs, also the whenever gem, .. but I'm not 100% sure I really need all that for what seems to be a pretty simple action?
Use the Rails.cache mechanism.
In your controller:
#record = Rails.cache("cached_record", :expires_in => 5.minutes) do
Model.first( :offset =>rand(Model.count))
end
During the first execution, result gets cached in the Rails cache. A new random record is retrieved after 5 minutes.
I would have an expiry_date in my model and then present the user with a javascript timer. After the time has elapsed, i would send a request back to the server(ajax probably, or maybe refreshing the page) and check whether the time has indeed expired. If so, i would present the new record.
You could simply check the current time in your controller, something like:
def show
#last_refresh ||= DateTime.now
#current ||= MyModel.get_random
#current = MyModel.get_random if (DateTime.now - #last_refresh) > 5.minutes
end
This kind of code wouldn't scale to more servers (as it relies on class variables for data storage), so in reality you would wan't to store the two class variables in something like Redis (or Memcache even) - that is for high performance. Depends really on how accurately you need this and how much performance you need. You could as well use your normal database to store expiry times and then load the record whose time is current.
My first though was to cache the record in a global, but you could end up with different records being served by different servers. How about adding a :chosen_at datetime column to your record...
class Model < AR::Base
def self.random
##random = first(:conditions => 'chosen_at NOT NULL')
return ##random unless ##random.nil? or ##random.chosen_at < 5.minutes.ago
##random.update_attribute(:chosen_at,nil) if ##random
ids = connection.select_all("SELECT id FROM things")
##random = find(ids[rand(ids.length)]["id"].to_i)
end
end