Let's see if i can explain this problem structured enough.
I run a webservice that handles email sending asynchronously using RabbitMQ and a ruby lib called Minion. On certain models (like a comment) we have an after create hook that adds an email-event to Rabbit. This event is then processed through a worker that runs as a rake task, a gist here, that loads the appropriate user and sends the email.
This setup works in 90% of the cases but every now and then the worker crashes due to an ActiveRecord::RecordNotFound exception. But how is this possible, I queue the event after the object is created and it takes additional ms for the event to pass through the Rabbit layer. Could it be that the context of the rake task causes the problem? Is it a bad choice to run long running workers within rake with the environment flag? Help! :)
Related
I have a rails module that processes some active record objects, only about 15-20 at a time, that I need to start off every two minutes.
I have tried to offload it to sidekiq (and sidekiq-cron), which works, but with the concurrency, created many race conditions and duplicate data.
I really just need a simple rake task cron for rails or maybe sinatra (as I would create a new sinatra app just to complete these tasks)
I either need to force sidekiq to process in a single thread or
have a "cron" job run a rake task or even the module directly
def self.process_events
events = StripeEvent.where(processed: false)
events = StripeServices.arrange_processing_order events
events.each do |event_obj|
StripeServices.new(event_obj).process_event_obj
end
end
thanks for any point in the right direction.
edited
sorry I wasnt very clear. pushing my module to sidekiq caused concurrency issues that I wasnt ready for (my bit of code is not threadsafe), and with the restrictions that Heroku places on "crons", whats the best way to run a rake task every 2 min?
If Sinatra can do it, I would prefer it, but I cant find the solution for that same problem.
It's not clear what are you asking. You already tried option 1, you can try option 2 (create the task and cron it, it's pritty easy) and you'll know better than anyone if it's better.
Anyway, I guess that both methods will have concurrency problems if one task takes more than 2 minutes.
You can add extra flags to prevent two task to process the same ServiceEvent (maybe add a boolean "processing" and set it to true when a task takes it).
Or maybe you can have a lock file to prevent a task to run if another one is already running (you create a file with a specific location and name when the task starts and delete it when it finishes processing, you can check if the file exists before starting a new task).
Can i run delayed_job or similar schedule frameworks inside of the web server eg. thin or unicorn?
If yes how do i start it? (code example would be very cool!)
The reason is that i want to save money during my application is just in a build-up phase and it is hosted on heroku.
Officially
No, there is no supported way to run delayed_jobs asynchronously within the web framework. From the documentation on running jobs, it looks like the only supported way to run a job is to run a rake task or the delayed job script. Also, it seems conceptually wrong to bend a Rack server, which was designed to handle incoming client requests, to support pulling tasks off of some queue somewhere.
The Kludge
That said, I understand that saving money sometimes trumps being conceptually perfect. Take a look at these rake tasks. My kludge is to create a special endpoint in your Rails server that you hit periodically from some remote location. Inside this endpoint, instantiate a Delayed::Worker and call .start on it with the exit_on_complete option. This way, you won't need a new dyno or command.
Be warned, it's kind of a kludgy solution and it will tie up one of your rails processes until all delayed jobs are complete. That means unless you have other rails processes, all incoming requests will block until this queue request is finished. Unicorn provides facilities to spawn worker processes. Whether or not this solution will work will also depend on your jobs and how long they take to run and your application's delay tolerances.
Edit
With the spawn gem, you can wrap your instantiation of the Delayed::Worker with a spawn block, which will cause your jobs to be run in a separate process. This means your rails process will be available to serve web requests immediately instead of blocking while delayed jobs are run. However, the spawn gem has some dependencies on ActiveRecord and I do not know what DB/ORM you are using.
Here is some example code, because it's becoming a bit hazy:
class JobsController < ApplicationController
def run
spawn do
#options = {} # youll have to get these from that rake file
Delayed::Worker.new(#options.merge(exit_on_complete: true)).start
end
end
end
Here's a link to a similar question:
Is it feasible to run multiple processeses on a Heroku dyno?
Bear in mind, as the post says, if you're only using one web dyno, it will be shut down if there's no traffic going to it.
In a similar vein, you might look into:
http://blog.codeship.io/2012/05/06/Unicorn-on-Heroku.html
To save on the need for multiple web dynos whilst you're building your app (although it's still subject to the above shutdown issue).
I would suggest you might look at running on a VPS directly, rather than Heroku (check out the railscast):
http://railscasts.com/episodes/337-capistrano-recipes
Once set up, it's pretty easy to deploy to. Heroku cuts out the devops part for you.
You can run it inside a separate worker of Unicorn, so it shares memory with the master process and get restarted together with the app.
See https://gist.github.com/brauliobo/11298486
our rails web app has to download/unpack archives with html pages from ftp on request for user's viewing through the browser.
the archive can be quite big, so user has to wait until it downloads/unpacks on the server.
i implemented progress bar the way that i call fork/Process.detach in user's request, so that his request is done but downloading/unpacking process continues running in the background. and javascript rendered in his browser pings our server for status until all is ready and then it redirects him to unpacked html pages.
as long as user requests one archive, everything goes smoothly, but if he tries to run 2 or more requests at the same time(so that more forks are started), it seems that only one of them completes, and the rest expires/times outs/gets killed by passenger(?). i suppose its the issue with Passenger/forking.
i am not sure if its possible to fix it somehow so i guess i need to switch to another solution. the solution needs to permit immediate and parallel processing of downloads. so that if user requests multiple archives, he has to see download/decompression progress in all of them at the same time.
i was thinking about running background rake job immediately but it seems very slow to startup(also there's a lot of cron rake tasks happening every minute on our server). reason i liked fork was that it was very fast to start. i know there is delayed job, we also use it heavily for other tasks. but can it start multiple processes at the same time immediately without queues?
solved by keeping the fork and using single dj worker. this way i can have as many processes starting at the same time as needed without trouble with passenger/modifying our product's gemset (which we are trying to avoid since it resulted in bugs in the past)
not sure if forking inside dj worker can cause any troubles, so asked at
running fork in delayed job
if id be free to modify gemset, id probably use resque as wrdevos suggested, or sidekiq, or girl_friday(but thats less probable because it depends on the server running).
Use Resque: https://github.com/defunkt/resque
More on bg jobs and Resque here.
https://github.com/blog/542-introducing-resque
I'm currently using cron and "rails runner" to execute background jobs. For the most part these jobs are simple polls "Find the records that are due to receive a reminder email. Send that email."
I've been watching my Amazon EC2 Small instance, and noticed that each time one of these cron job kicks in, the CPU spikes to ~99%. The teeny tiny little query inside my current job is definitely not responsible. I'm presuming that the spike is simply due to the effort of loading the full rails environment via "rails runner".
Is there a more CPU efficient way to handle regularly scheduled batch jobs?
P.S. I know that in the particular example of sending a reminder email at time X in the future, I could delayed_jobs, and simply schedule the job in the future. Not every possible task fits into the delayed_jobs framework very well though, so I'm looking for a more traditional "cron job" type solution. Like "rails runner", but without the crazy CPU consequences.
You can use workers witch don't load rails env. Or load it only once(like resque)
I don't think there is a solution for this, since you do need to load a Rails environment to handle whatever that is you are handling. So when on the "cron" model you will be starting up a handler which in turn will create some load on your instance. I don't know how cloud services lend themselves to this, but I think the optimal model in your case would be to have a running daemon for job handling and forking coupled with REE for the job execution (that helps prevent memory leaks by letting as much as possible happen in the child process that will die at the end of the execution loop).
The daemon could be configured to accept signals (also via a job queue) that would spin off jobs doing specific things.
I have an application that checks a database every minute for any emails that are supposed to be sent out at that time. I was thinking about making this a rake task that would be run by a cron job every minute. Would there be a better solution to this?
From what I have read, this isn't ideal because rake has to load the entire rails environment every minute and this becomes expensive.
Thoughts?
Thanks.
You can use backgroundrb. This, however, will eat up memory away from your main Rails app as it will spawn one Ruby instance exclusive to backgroundrb.
You can also define a SystemController (or equivalent) in your main application, with various actions corresponding to the various household tasks your application should perform. You can "prod" it from crontab using wget or curl, the advantage being that it shares resources with your main application. Depending on how paranoid or you are, or on how vulnerable to DOS (or other types of attacks) exposing such a controller to, possibly, the outside world, you may choose to block access to this controller's URL from addresses other than the loopback (ideally in your reverse proxy, alternatively from the controller itself.)
One really simple method would be to have a script that does..
while true do
check_and_send_messages()
sleep 60
end
..which means you are not constantly respawning the Rails environment.
Obviously it has various flaws, but also has some benefits (for example, with your 1-Rake-per-minute, it the Rake task takes more than one minute, Rake will be running multiple times at once)
Also, the Railscasts episodes Rake in Background, Starling and Workling, and Custom Daemon might give you some ideas (they are describing exactly this task)
It turns out there's actually something built just for this: ar_mailer. ar_mailer queues up the e-mails into the DB and then sends them out periodically using the ar_mailer command. You can call ar_mailer every minute.
The nice thing about ar_mailer is that it basically requires very little change in terms of how you already send e-mails. You just need to inherit from ar_mailer instead of ActiveMailer. Using this method, you won't have to worry about running rake tasks in the background, forking processes, or anything like that - and in effect you get a real mail server with queued messages that are deleted when the mail is actually sent. This feature is important if you have a system that sends out large numbers of e-mail enmass. I've used ar_mailer to build a social network - so I can attest to its robustness.
Here's a good article that talks about ar_mailer in depth. I would strongly advise against rolling your own solution here as Eric has built a time-tested solution to this very problem.
I do what Vlad suggested (#2), with only local requests honored, and I'm paranoid enough to also require a specific query string tacked on to the url.
I have several periodic actions set up this way.