Rails move expensive method to task - ruby-on-rails

I have these two methods in my model. One method looks up a single CatalogItem facebook like count, and another that loops through all active CatalogItems and finds their like counts using the aforementioned.
It takes a while to run through all active facebook likes...it might loop anywhere from 300-1000 objects; so i'd like to move this to some sort of cron, or whatever you guys suggest.
I was thinking I should add a column to CatalogItem called cached_fb_count, and adapt self.facebook_likes to write to that colimn whenever that task runs.
Is this the right approach? What would that task look like if it was running every 2 hours?
def self.facebook_likes
self.active.each_with_index do |i, index|
_likes = i.facebook_like_count
i.update_attribute(:cached_likes, _likes)
# puts "#{index+1} Likes: #{_likes} ########### ID: #{i.id} "
end
end
def facebook_like_count
item_like_count = JSON.parse(open("https://api.facebook.com/method/fql.query?query=select%20like_count%20from%20link_stat%20where%20url='https://www.foobar.com/catalog_items/#{self.id}'&format=json").read).first.flatten[1]
item_like_count = item_like_count + 1 if item_like_count > 0
end

Delayed_job is a perfect tool for doing asynchronous tasks. It runs in a separate process, relation database-based (Active Record) so it saves context of execution as a simple script invokation. And has a rich functionality inculding task's priority and scheldue. but If you tasks assumes huge queues, consider Resque gem. it uses Reddis as a storage for tasks and deals much faster with long queues.

Use whenever its very easy to set up. Here is the link : https://github.com/javan/whenever

Related

How to do periodic polling/query to database in every 5 sec in background in ruby on rails

I have a ruby on rails application which sends push notifications based on new records created in database in every 5 seconds. Is there any way apart from CRON jobs to do polling/query in DB to check for new records.
My current code implementation is running CRON evey minute which sleeps for 5 seconds, but this approach taking huge memory consumption.
schedule.rb #whenever gem file
every 1.minute do
runner "DailyNotificationChecker.send_notifications"
end
*
### below code calls method process_notes every 5 sec
def self.send_notifications
expiry_time = Time.now + 57
while (Time.now < expiry_time)
if RUN_SCHEDULER == "true" || RUN_SCHEDULER == true
process_notes #calls method mentioned below
end
sleep 5 #seconds
end
end
### below code check for new records and sends notification
def self.process_notes
notes = nil
time = Benchmark.measure do
Note.uncached do
notes = Note.where("(created_at > DATE_SUB(now(), INTERVAL 2 minute)) AND `notes`.`processed` = 0")
if notes.present?
note_ids = notes.collect{|x| x.id}
RealtimeNotifier.new.delay.perform(note_ids,NOTE_CREATED,TEMP_USER_NOTE)
end
end
end
end
This code snippet is working but not an optimised solution. Is there a better way to achieve this.
MySQL allows you to setup a trigger that calls a user defined function (written in C or C++), other databases might have a similar functionality. From there you can replace the current running process with a bash script in which you place the command(s) to run. I can't give you an example, since I never needed this. But this might point you in the right direction.
Have a look at the following pages:
MySQL Adding a New User-Defined Function
man 3 exec (you might need man 2 fork to first split of a new process here is an example)
How to Create and Use Bash Scripts (trigger rake command here)
Applying this solution would change your poll based solution into a push based solution.
I have a ruby on rails application which sends push notifications based on new records created in database in every 5 seconds.
So you currently have a job/process that sits in memory and polls database, then does something about new records. How about inverting the workflow? Some other part of your code creates those records, yes? Also make that part put a new RealtimeNotifier job in the queue. Cut out the middleman*, so to say.
* yes, the code is now more coupled, but also easier to reason about.

Idempotent Design with Sidekiq Ruby on Rails Background Job

Sidekiq recommends that all jobs be idempotent (able to run multiple times without being an issue) as it cannot guarantee a job will only be run one time.
I am having trouble understanding the best way to achieve that in certain cases. For example, say you have the following table:
User
id
email
balance
The background job that is run simply adds some amount to their balance
def perform(user_id, balance_adjustment)
user = User.find(user_id)
user.balance += balance_adjustment
user.save
end
If this job is run more than once their balance will be incorrect. What is best practice for something like this?
If I think about it a potential solution I can come up with is to create a record before scheduling the job that is something like
PendingBalanceAdjustment
user_id
balance_adjustment
When the job runs it will need to acquire a lock for this user so that there's no chance of a race condition between two workers and then will need to both update the balance and delete the record from pending balance adjustment before releasing the lock.
The job then looks something like this?
def perform(user_id, balance_adjustment_id)
user = User.find(user_id)
pba = PendingBalanceAdjustment.where(:balance_adjustment_id => balance_adjustment_id).take
if pba.present?
$redis.lock("#{user_id}/balance_adjustment") do
user.balance += pba.balance_adjustment
user.save
pba.delete
end
end
end
This seems to solve both
a) Race condition between two workers taking the job at the same time (though you'd think Sidekiq could guarantee this already?)
b) A job being run multiple times after running successfully
Is this pattern a good solution?
You're on the right track; you want to use a database transaction, not a redis lock.
I think you're on the right track too but you're solution might be overkill since I don't have full knowledge of your application.
BUT, a simpler solution would simply be to have a flag on you User model like balance_updated:datetime. So, you could check that before updating.
As Mike mentions using a Transaction block should ensure it's thread safe.
In any case, to answer your question more generally... having an updated_ column is usually good enough to start with, and then if it gets complicated you can move this stuff to another model.

Accessing rake task variables in controller and Scheduling rake tasks

I have a rake task send_emails which send e-mails to lot of people. I call this rake task from controller as mentioned in Rake in Background railscast. But I want to schedule this rake task to run at a particular date and time, which is not same for everyday (it's not a cron job). The date and time are set dynamically from a form.
For the above implemented rake task for sending emails, I want to show the status of the mailing process to the end-user. For instance, say there is a response object in the rake task which I can use as response.status,response.delivered?,response.address, etc. How can I access this object ( or any variable) in the rake file in my controller?
I don't want to use delayed_job but want to implement it's functionality of run_at and in_the_future. Also the whenever gem won't be able to solve my first problem coz I won't be able to pass date and time to it's scheduler.
First thing, calling rake task from controller is a bad practice. Ryan published that video at 2008 since that many better solution have came up. You shouldn't ignore it.
I suggest you to use delayed_job, it serves your needs in a great way. Since, if you want to invoke task dynamically, there should be some checker which will continuously check the desire field every second. Delayed job keep checking its database every time, you can use that.
Anyway,You can use something like this
def self.run_when
Scheduler.all.each do |s|
if d.dynamically_assigned_field < 1.second.ago
d.run_my_job!
d.status = "finished"
d.save
end
end
end
And, in model you can do something like this
def run_my_job!
self.status = "processing"
self.save
long_running_task
end
One thing also you should keep in mind that if too many workers/batch/cron job starts at run at same it will fight for resources and may enter into deadlock state. As per your server capacity, you should limit the running jobs.
Sidekiq is also a good option you can consider. Personally, i like sidekiq because it doesn't hit my database everytime , scales very effectively. It uses redis but it is expensive.
I would create new model for mail job, like this:
app/models/mail_job.rb
class MailJob
attr_accessible :email, :send_at, :delivered
scope :should_deliver, -> { where(delivered: false).where('send_at <= ?', Time.now) }
def should_deliver?
!delivered? && send_at <= Time.now
end
...
end
And use Sidekiq + Sidetiq, running every minute (or any other interval) and checking for mail jobs that should be delivered.
Hope this helps!

What's the best way to schedule and execute repetitive tasks (like scraping a page for information) in Rails?

I'm looking for a solution which enables:
Repetitive executing of a scraping task (nokogiri)
Changing the time interval via http://www.myapp.com/interval (example)
What is the best solution/way to get this done?
Options I know about
Custom Rake task
Rufus Scheduler
Current situation
In ./config/initializers/task_scheduler.rb I have:
require 'nokogiri'
require 'open-uri'
require 'rufus-scheduler'
require 'rake'
scheduler = Rufus::Scheduler.new
scheduler.every "1h" do
puts "BEGIN SCHEDULER at #{Time.now}"
#url = "http://www.marktplaats.nl/z/computers-en-software/apple-ipad/ipad-mini.html? query=ipad+mini&categoryId=2722&priceFrom=100%2C00&priceTo=&startDateFrom=always"
#doc = Nokogiri::HTML(open(#url))
#title = #doc.at_css("title").text
#number = 0
2.times do |number|
#doc.css(".defaultSnippet.group-#{#number}").each do |listing|
#listing_title = listing.at_css(".mp-listing-title").text
#listing_subtitle = listing.at_css(".mp-listing-description").text
#listing_price = listing.at_css(".price").text
#listing_priority = listing.at_css(".mp-listing-priority-product").text
listing = Listing.create(title: "#{#listing_title}", subtitle: "#{#listing_subtitle}", price: "#{#listing_price}")
end
#number +=1
end
puts "END SCHEDULER at #{Time.now}"
end
Is it not working?
Yes the current setup is working. However, I don't know how to enable changing the interval time via http://www.myapp.com/interval (example).
Changing scheduler.every "1h" to scheduler.every "#{#interval} do does not work.
In what file do I have to define #interval for it to work in task_scheduler.rb?
I'm not very familiar with Rufus Scheduler but it appears that it will be difficult to acheive both of your goals (regular heartbeat, dynamically rescheduled) with it. In order for it to work, you'll have to capture the job_id that it returns, use that job_id to stop the job if a rescheduling event occurs, and then create the new job. Rufus also points out that it's an in-memory application whose jobs will disappear when the process disappears -- reboot the server, restart the application, etc and you've got to reschedule from scratch.
I'd consider two things. First, I'd consider creating a model that wraps the screen-scraping that you want to do. At a minimum you'd capture the url and the interval. The model may wrap up the code for processing the html response (basically what's wrapped up in the 2.times block) as instance methods that you trigger based on the URL. You may also capture this in a text column and use eval on it, assuming that only "good guys" get access to this part of the system. This has a couple of advantages: you can quickly expand to scraping other sites and you can sanitize the interval sent back by the user.
Second, something like Delayed::Job may better suit your needs. Delayed::Job allows you to specify a time for the job's execution which you could fill in by reading the model and converting the interval to a time. The key to this approach is that the job must schedule the next iteration of itself before it exits.
This won't be as rock-steady as something like cron but it does seem to better address the rescheduling need.
First off: your rufus scheduler code is in an initializer, which is fine, but it is executed before the rails process is started, and only when the rails process is started. So, in the initializer you have no access to any variable #interval you could set, for instance in a controller.
What are possible options, instead of a class variable:
read it from a config file
read it from a database (but you will have to setup your own connection, in the initializer activerecord is not started imho
And ... if you change the value you will have to restart your rails process for it to have effect again.
So an alternative approach, where your rails process handles the interval of the scheduled job, is to use a recurring background job. At the end of the background, it reschedules itself, with the at that moment active interval. The interval is fetched from the database, I would propose.
Any background job handler could do this. Check ruby toolbox, I vote for resque or delayed_job.

Rails 3.1/rake - datespecific tasks without queues

I want to give my users the option to send them a daily summary of their account statistics at a specific (user given) time ....
Lets say following model:
class DailySummery << ActiveRecord::Base
# attributes:
# send_at
# => 10:00 (hour)
# last_sent_at
# => Time of the last sent summary
end
Is there now a best practice how to send this account summaries via email to the specific time?
At the moment I have a infinite rake task running which checks permanently if emails are available for sending and i would like to put the dailysummary-generation and sending into this rake task.
I had a thought that I could solve this with following pseudo-code:
while true
User.all.each do |u|
u.generate_and_deliver_dailysummery if u.last_sent_at < Time.now - 24.hours
end
sleep 60
end
But I'm not sure if this has some hidden caveats...
Notice: I don't want to use queues like resq or redis or something like that!
EDIT: Added sleep (have it already in my script)
EDIT: It's a time critical service (notification of trade rates) so it should be as fast as possible. Thats the background why I don't want to use a queue or job based system. And I use Monit to manage this rake task, which works really fine.
There's only really two main ways you can do delayed execution. You run the script when an user on your site hits a page, which is inefficient and not entirely accurate. Or use some sort of background process, whether it's a cron job or resque/delayed job/etc.
While your method of having an rake process run forever will work fine, it's inefficient because you're iterating over users 24/7 as soon as it finishes, something like:
while true
User.where("last_sent_at <= ? OR last_sent_at = ?", 24.hours.ago, nil).each do |u|
u.generate_and_deliver_dailysummery
end
sleep 3600
end
Which would run once an hour and only pull users that needed an email sent is a bit more efficient. The best practice would be to use a cronjob though that runs your rake task though.
Running a task periodically is what cron is for. The whenever gem (https://github.com/javan/whenever) makes it simple to configure cron definitions for your app.
As your app scales, you may find that the rake task takes too long to run and that the queue is useful on top of cron scheduling. You can use cron to control when deliveries are scheduled but have them actually executed by a worker pool.
I see two possibilities to do a task at a specific time.
Background process / Worker / ...
It's what you already have done. I refactored your example, because there was two bad things.
Check conditions directly from your database, it's more efficient than loading potential useless data
Load users by batch. Imagine your database contains millions of users... I'm pretty sure you would be happy, but not Rails... not at all. :)
Beside your code I see another problem. How are you going to manage this background job on your production server? If you don't want to use Resque or something else, you should consider manage it another way. There is Monit and God which are both a process monitor.
while true
# Check the condition from your database
users = User.where(['last_sent_at < ? OR created_at IS NULL', 24.hours.ago])
# Load by batch of 1000
users.find_each(:batch_size => 1000) do |u|
u.generate_and_deliver_dailysummery
end
sleep 60
end
Cron jobs / Scheduled task / ...
The second possibility is to schedule your task recursively, for instance each hour or half-hour. Correct me if I'm wrong, but do your users really need to schedule the delivery at 10:39am? I think that let them choose the hour is enough.
Applying this, I think a job fired each hour is better than an infinite task querying your database every single minute. Moreover it's really easy to do, because you don't need to set up anything.
There is a good gem to manage cron task with the ruby syntax. More infos here : Whenever
You can do that, you'll need to also check for the time you want to send at. So starting with your pseudo code and adding to it:
while true
User.all.each do |u|
if u.last_sent_at < Time.now - 24.hours && Time.now.hour >= u.send_at
u.generate_and_deliver_dailysummery
# the next 2 lines are only needed if "generate_and_deliver_dailysummery" doesn't sent last_sent_at already
u.last_sent_at = Time.now
u.save
end
end
sleep 900
end
I've also added the sleep so you don't needlessly hammer your database. You might also want to look into limiting that loop to just the set of users you need to send to. A query similar what Zachary suggests would be much more efficient than what you have.
If you don't want to use a queue - consider delayed job (sort of a poor mans queue) - it does run as a rake task similar to what you are doing
https://github.com/collectiveidea/delayed_job
http://railscasts.com/episodes/171-delayed-job
it stores all tasks in a jobs table, usually when you add a task it queues it to run as soon as possible, however you can override this to delay it until a specific time
you could convert your DailySummary class to DailySummaryJob and once complete it could re-queue a new instance of itself for the next days run
How did you update the last_sent_at attribute?
if you use
last_sent_at += 24.hours
and initialized with last_sent_at = Time.now.at_beginning_of_day + send_at
it will be all ok .
don't use last_sent_at = Time.now . it is because there may be some delay when the job is actually done , this will make the last_sent_at attribute more and more "delayed".

Resources