Delay sending mails to boost page load time - ruby-on-rails

In my Product#create method I have something like
ProductNotificationMailer.notify_product(n.email).deliver
Which fires off if the product gets saved. Now thing is before the above gets fired off, there are bunch of logics and calculations happening which delays the confirmation page load time. Is there a way to make sure the next page loads first and the mail delivery can happen later or in the background?
Thanks

Yes, you'll want to look into background workers. Sidekiq, DelayedJob or Resque are some popular ones.
Here's a great RailsCast demonstrating Sidekiq.
class NotificationWorker
include Sidekiq::Worker
def perform(n_id)
n = N.find(n_id)
ProductNotificationMailer.notify_product(n.email).deliver
end
end
I'm not sure what n was in your example, so I just went with it. Now where you do the work, you can replace it with:
NotificationWorker.perform_async(n.id)
The reason you don't pass full object n as an argument, is because the arguments will be serialized, and it's easier/faster to serialize just the integer id.
Once the jobs is stored, you have a second process running in the background that will do the work, freeing up your web process to immediately go back to rendering the response.

Delayed jobs will do this:
Here is the github page.
and here is a railscast on setting it up.

Related

Rails debounce delayed job background task? removing duplicates

Debouncing is a common method to postpone a function/job from executing until after certain time has passed.
Use-case: A conversation with active chatting from multiple users, they should not receive an email notification for each message typed. But more than likely after a few minutes of silence, if the messages are unread, the user should see a notification.
Delayed_Job
Has no solution, has related issues: https://github.com/collectiveidea/delayed_job/issues/72
Sidekiq
https://github.com/hummingbird-me/sidekiq-debounce
Doing yourself is not so bad.
class AdminJob
def self.debounce(job, args={})
handler = YAML.dump(job)
count = Delayed::Job.where(handler: handler).where('locked_at IS NULL').delete_all
Rails.logger.info("deleted: #{count} jobs")
Delayed::Job.enqueue(job, args)
end
end
Instead of writing:
Delayed::Job.enqueue(YourJobName.new(account_id), {run_at: 10.minutes.from_now})
You now write:
AdminJob.debounce(YourJobName.new(account_id), {run_at: 10.minutes.from_now})
Delayed job serializes your job params in YAML and then saves it to the database as handler. So if you call AdminJob.debounce(...) 10 times in a row, it will delete before each.
Make sure to give yourself time (5.minutes, etc) to give users to keep taking actions. If you run your job after 1 second, its likely they'll keep taking actions and trigger again.
Yes i'm answering my own question 3 years later...
What I would do is schedule a periodic task, lets say every 5 minutes that checks if there is someone who has to be notified. Yes it seems an expensive operation, but for your use case I don't see (for now) others solutions. So lets say you have 10 users that uses a chat. Every 5 minutes you could check if there are users that didn't see some messages, and if so you notify them only if they are inactive from N minutes.
To schedule such task you could use crono gem. Check this answer.
Crono lets you do thing like that:
Crono.perform(CheckUsersToBeNotifiedJob).every 5.minutes
If you are using Delayed Job as the implementation for Active Job then take a look at the 'activejob-trackable' gem.
https://github.com/ignatiusreza/activejob-trackable
It uses another table that tracks the jobs and can throttle and debounce.

How to avoid meeting Heroku's API rate limit with delayed job and workless

My Survey model has about 2500 instances and I need to apply the set_state method to each instance twice. I need to apply it the second time only after every instance has had the method applied to it once. (The state of an instance can depend on the state of other instances.)
I'm using delayed_job to create delayed jobs and workless to automatically scale up/down my worker dynos as required.
The set_state method typically takes about a second to execute. So I've run the following at the heroku console:
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
Shouldn't be any issues with overloading the API, right?
And yet I'm still seeing the following in my logs for each delayed job:
Heroku::API::Errors::ErrorWithResponse: Expected(200) <=> Actual(429 Unknown)
I'm not seeing any infinite loops -- it just returns this message as soon as I create the delayed job.
How can I avoid blowing Heroku's API rate limits?
Reviewing workless, it looks like it incurs an API call per delayed job to check the worker count and potentially a second API call to scale up/down. So if you are running 5000 (2500x2) jobs within a short period, you'll end up with 5000+ API calls. Which would be well in excess of the 1200/requests per hour limit. I've commented over there to hopefully help toward reducing the overall API usage (https://github.com/lostboy/workless/issues/33#issuecomment-20982433), but I think we can offer a more specific solution for you.
In the mean time, especially if your workload is pretty predictable (like this). I'd recommend skipping workless and doing that portion yourself. ie it sounds like you already know WHEN the scaling would need to happen (scale up right before the loop above, scale down right after). If that is the case you could do something like this to emulate the behavior in workless:
require 'heroku-api'
heroku = Heroku::API.new(:api_key => ENV['HEROKU_API_KEY'])
client.post_ps_scale(ENV['APP_NAME'], 'worker', Survey.count)
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
min_workers = ENV['WORKLESS_MIN_WORKERS'].present? ? ENV['WORKLESS_MIN_WORKERS'].to_i : 0
client.post_ps_scale(ENV['APP_NAME'], 'worker', min_workers)
Note that you'll need to remove workless from these jobs also. I didn't see a particular way to do this JUST for certain jobs though, so you might want to ask on that project if you need that. Also, if this needs to be 2 pass (the first time through needs to finish before the second), the 4 second sleep may in some cases be insufficient but that is a different can of worms.
I hope that helps narrow in on what you needed, but I'm certainly happy to discuss further and/or elaborate on the above as needed. Thanks!

Catching errors with Ruby Twitter gem, caching methods using delayed_job: What am I doing wrong?

What I'm doing
I'm using the twitter gem (a Ruby wrapper for the Twitter API) in my app, which is run on Heroku. I use Heroku's Scheduler to periodically run caching tasks that use the twitter gem to, for example, update the list of retweets for a particular user. I'm also using delayed_job so scheduler calls a rake task, which calls a method that is 'delayed' (see scheduler.rake below). The method loops through "authentications" (for users who have authenticated twitter through my app) to update each authorized user's retweet cache in the app.
My question
What am I doing wrong? For example, since I'm using Heroku's Scheduler, is delayed_job redundant? Also, you can see I'm not catching (rescuing) any errors. So, if Twitter is unreachable, or if a user's auth token has expired, everything chokes. This is obviously dumb and terrible because if there's an error, the entire thing chokes and ends up creating a failed delayed_job, which causes ripple effects for my app. I can see this is bad, but I'm not sure what the best solution is. How/where should I be catching errors?
I'll put all my code (from the scheduler down to the method being called) for one of my cache methods. I'm really just hoping for a bulleted list (and maybe some code or pseudo-code) berating me for poor coding practice and telling me where I can improve things.
I have seen this SO question, which helps me a little with the begin/rescue block, but I could use more guidance on catching errors, and one the higher-level "is this a good way to do this?" plane.
Code
Heroku Scheduler job:
rake update_retweet_cache
scheduler.rake (in my app)
task :update_retweet_cache => :environment do
Tweet.delay.cache_retweets_for_all_auths
end
Tweet.rb, update_retweet_cache method:
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
end
end
User.rb, twitter method:
def twitter
authentication = Authentication.find_by_user_id_and_provider(self.id, "twitter")
if authentication
#twitter ||= Twitter::Client.new(:oauth_token => authentication.oauth_token, :oauth_token_secret => authentication.oauth_secret)
end
end
Note: As I was posting this, I noticed that I'm finding all "twitter" authentications in the "cache_retweets_for_all_auths" method, then calling the "User.twitter" method, which specifically limits to "twitter" authentications. This is obviously redundant, and I'll fix it.
First what is the exact error you are getting, and what do you want to happen when there is an error?
Edit:
If you just want to catch the errors and log them then the following should work.
def self.cache_retweets_for_all_auths
#authentications = Authentication.find_all_by_provider("twitter")
#authentications.each do |authentication|
being
authentication.user.twitter.retweeted_to_me(include_entities: true, count: 200).each do |tweet|
# Actually build the cache - this is good - removing to keep this short
end
rescue => e
#Either create an object where the error is log, or output it to what ever log you wish.
end
end
end
This way when it fails it will keep moving on to the next user but will still making a note of the error. Most of the time with twitter its just better to do something like this then try to do with each error on its own. I have seen so many weird things out of the twitter API, and random errors, that trying to track down every error almost always turns into a wild goose chase, though it is still good to keep track just in case.
Next for when you should use what.
You should use a scheduler when you need something to happen based on time only, delayed jobs for when its based on an user action, but the 'action' you are going to delay would take to long for a normal response. Sometimes you can just put the thing plainly in the controller also.
So in other words
The scheduler will be fine as long as the time between updates X is less then the time it will take for the update to happen, time Y.
If X < Y then you might want to look at calling the logic from the controller when each indvidual entry is accessed, isntead of trying to do them all at once. The idea being you would only update it after a certain time as passed so. You could store the last time update either on the model itself in a field like twitter_udpate_time or in a redis or memecache instance at a unquie key for the user/auth.
But if the individual update itself is still too long, then thats when you should do the above, but instead of doing the actually update, call a delayed job.
You could even set it up that it only updates or calls the delayed job after a certain number of views, to further limit stuff.
Possible Fancy Pants
Or if you want to get really fancy you could still do it as a cron job, but have a point system based on views that weights which entries should be updated. The idea being certain actions would add points to certain users, and if their points are over a certain amount you update them, and then remove their points. That way you could target the ones you think are the most important, or have the most traffic or show up in the most search results etc etc.
Next off a nick picky thing.
http://api.rubyonrails.org/classes/ActiveRecord/Batches.html
You should be using
#authentications.find_each do |authentication|
instead of
#authentications.each do |authentication|
find_each pulls in only 1000 entries at a time so if you end up with a lof of Authentications you don't end up pulling a crazy amount of entries into memory.

Delayed Jobs on Rails 2: can't save in "perform" method/how to know when job is done?

I'm using the Delayed Jobs plugin for Rails 2, and every time I try to modify a model and save it in the "perform" method required by Delayed Jobs, it fails out (no error messages or anything, it's just listed as a failure in the database).
I have the "perform" method in one of my rails model files (Video), and I'm passing an instance of that model (#video, let's say) to the Delayed::Job.enqueue
Is it a known issue that you can't do database modifications while in the queue? Am I doing something wrong (it only fails when it tries to save, not when I'm actually changing the attributes, and that sounds like a database modification issue).
If this IS expected: How can I fix it? I'm trying to save a "done" attribute to true, so I know when the model is ready to get to the next step. Is there some standard way to figure out when a delayed job is done?
EDIT: I have confirmed that calling perform standalone (without delayed job) has no problems with saving (no errors or warnings, or anything). When I call it through DelayedJobs it fails IMMEDIATELY (no time out) the second it gets to the save line.
EDIT: Wait, I think I see what is going on: my "perform" is part of an "after_create" call back... Which is all well and good, until I try to SAVE. It looks like when I save, it calls perform AGAIN (while already in perform), and that doesn't fly with Delayed Jobs (nor should it). For some reason I thought after_create would only get called once (not after every save). Wait, a simple test just showed that that IS the case. Hrrm... So why is perform called twice when I save, and once when I don't, in delayed jobs?
My code:
after_create :start_transcodes
def start_transcodes
Delayed::Job.enqueue self
end
def perform
puts "performing"
self.flash_status = 100
self.save!
puts "done"
end
What I see:
performing
performing
2 jobs processed at 3.3406 j/s, 2 failed ...
I don't see it say "done" ever.
What I DO see in my rails log is:
"* [JOB] Video failed with NameError: undefined local variable or method `flush_deletes' for #<Paperclip::Attachment:0xb6e51da0> - 2 failed attempts
undefined local variable or method `flush_deletes' for #<Paperclip::Attachment:0xb6e51da0>"
I am using the paperclip plugin for this class, and I can call save all day (even in that perform method) and get no problems. I ALSO can call save(again, even in perform) all day and not see my after_create method called more than once--UNLESS I"m using Delayed Job. (might be it doing some sort of auto retry?)
I'm gonna look around my paperclip plugin, see what's going on...
If your save fails, its got nothing to do with delayed_job ( atleast it shouldn't be unless the save takes longer than the MAX_RUN_TIME. )
Try diagnosing the problem with the save, by not using delayed_job.
Also take a look at the delayed_job.log file in your logs
Okay, not sure EXACTLY what was happening, but I've made a skeleton "TranscodeJob" class in my lib directory. This class gets initialized with a reference to what video I want it to process, and processes it, and saves it, and plays nicely with Delayed Job.
Basically, it looks like passing my entire complicated Video object (complete with paperclip plugins) to Delayed Job was freaking things out, and passing a simple object, with no more info than it needs makes things much easier.
Below is the code I used, which works just fine (and if that works fine, i can add my long running code back little by little and confirm it continues to do so, but it worked fine before, just hiccuped with saving)
class TranscodeJob
def initialize(video_id)
#video_id = video_id
end
#delayed jobs expected method
def perform
#video = Video.find(#video_id)
#video.flash_status = 100
#video.save!
end
end
This code is STILL called from a after_create filter, and I'm not seeing it called twice, so it looks like I mistoke DelayedJobs auto-retry for recursion, or something.

Need alternative to filters/observers for Ruby on Rails project

Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. I need an alternative.
The problem: I'm logging web server hits to a large number of pages. What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. The email does not have to be sent out right away (i.e. a 5-10 minute delay is acceptable).
What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. It's not exactly elegant, but it will work.
Does anyone else have a better idea?
Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj
DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.
class NewsletterJob < Struct.new(:text, :emails)
def perform
emails.each { |e| NewsletterMailer.deliver_text_to_email(text, e) }
end
end
Delayed::Job.enqueue( NewsletterJob.new("blah blah", Customers.find(:all).collect(&:email)) )
I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.
Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.
Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.
When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.
Your original solution isn't bad either.
I have to write something here so that stackoverflow code-highlights the first line.
class ApplicationController < ActionController::Base
before_filter :increment_fancy_counter
private
def increment_fancy_counter
# somehow increment the counter here
end
end
# lib/tasks/fancy_counter.rake
namespace :fancy_counter do
task :process do
# somehow process the counter here
end
end
Have a cron job run rake fancy_counter:process however often you want it to run.

Resources