Background processing in rails - ruby-on-rails

This might seem like a FAQ on stackoverflow, but my requirements are a little different. While I have previously used BackgroundRB and DJ for running background processes in ruby, my requirement this time is to run some heavy analytics and mathematical computations on a huge set of data, and I need to do this only about the first 15 days of the month. Going by this, I am tempted to use cron and run a ruby script to accomplish this goal.
What I would like to know / understand is:
1 - Is using cron a good idea (cause I'm not a system admin, and so while I have basic idea of cron, I'm not overly confident of doing it perfectly)
2 - Can we somehow modify DJ to run only on the first 15 days of the month (with / without using cron), and then just stop and exit once all the jobs in the queue for the day are over (don't want it to ping the DB every time for a new job...whatever the jobs will be in the queue when DJ starts, that will be all).
I'm not sure if I have put the question in the right manner, but any help in this direction will be much appreciated.
Thanks

With cron's "minute hour day month dayofweek" time specification, 3:33am 1st through fifteenth of every month would be "33 3 1-15 * *"

Using cron would be really easy, and you have a lot of example and it is reliable.
Anyway here are few screen casts from Railcasts you may want to look at:
Starling and Workling
Custom Daemon

Yeah, why not? Go with cron. It's really well-tested in the wild, well suited for running periodical tasks and incredibly easy to use. You don't even need to learn the crontab syntax (although it's very easy) - just drop your script into /etc/cron.daily (this option might be available only on some Linux distros).
I'm not sure about the "only first fiteen days of the month" thing, but you can easily handle this condition inside your task, right?
EDIT:
Check out par's answer to see how to run the task only at a certain range of days.

I also had this requirement. I followed the "Automatic Periodic Tasks" recipe 75 in the Advanced Rails Recipes book. The recipe is written by David Bock. It has some code snippets and guidelines on how this can be achieved using cron and capistrano. However there is an unsolved (but mentioned) issue regarding users/permissions that has to be on the target machine. It is not really difficult to make it right, you just have to remember to do it and put it in you deployment capistrano scripts.
It seems that David Bock has continued to work on this and has now creaated a gem for use with cron: See his blog, and follow crondonkulous on github. Crondonkulous may very well take care of this user/permission thing and more, I haven't tried it.
Jarl

Related

Rails Background Process (Heroku Rails 3+)

I'm am going to set up some functionality for my app that is Rails 3.2.3 and on Heroku. The idea is to have a task, or job (or whatever you want to call it) run every day, to make sure user information from the external API is up to date with the user information in my db. I'm curious what is the the best way to set this up? Should it be a cron job that runs a rake task?
Seems like there are quite a few ways to do this and I'm interested in the ways others are doing this. The only way I can think to do it is to run a rake task in a cron job, but would love to figure out what best practices are, or the most simple way to do it. Seems like there are a lot of ways to skin this cat... lots of different tools out there too.
If there was a pure rails way to do this, I think that would be better so I don't have to screw around with every system I place my app onto.
For a simple sync job that runs once a day, I believe having a cronjob would be sufficient and likely more stable in the long run.
Honestly, solutions such as Resque and Sidekiq is a bit overkill in my opinion (for your needs). You're still required to use a scheduler to send messages to these systems.
Check out the gem 'whenever' if you're looking at making the deployment and writing of crontabs easier: https://github.com/javan/whenever/
Railscasts regarding 'whenever': http://railscasts.com/episodes/164-cron-in-ruby
There are two options. They're better than options you mentioned in your question
Resque.
Sidekiq.
Try the later one. It is faster, lightweight and based on multithreading so there isn't interference with system. You'll need to look into scheduler of both the gem for processing everyday.
Hope this helps!
Use the Heroku scheduler add on to the handle scheduling itself. You can have it run a rake task, resque, or whatever.
Here is a few to choose from :
resque (with resque-scheduler. But you have to use redis with it)
rufus-scheduler ( if you want something simple, resque uses rufus-scheduler itself)
You may try delayed_job with a few tricks like this one. Not that great for scheduling but can use your application database.

Best current rails background task method?

I am trying to find out the best way to run scripts in the background. I have been looking around and found plenty of options, but many/most seem to have become inactive in the past few years. Let me describe my needs.
The rails app is basically a front-end to configure when and how these scripts will be run. The scripts run and generate reports and send email alerts. So the user must be able to configure the start times and how often these scripts will run dynamically. The scripts themselves should have access to the rails environment in order to save the resulting reports in the DB.
Just trying to figure out the best method from the myriad of options.
I think you're looking for a background job queuing system.
For that, you're either looking for resque or delayed_job. Both support scheduling tasks at some point in the future -- delayed_job does this natively, whereas resque has a plugin for it called resque_scheduler.
You would enqueue jobs in the background with parameters that you specify, and then at the time you selected they'll be executed. You can set jobs to recur indefinitely or a fixed number of times (at least with resque-scheduler, not sure about delayed_job).
delayed_job is easier to set up since it saves everything in the database. resque is more robust but requires you to have redis in your stack -- but if you do already it's pretty much the ideal solution for your problem.
I recently learned about Sidekiq, and I think it is really great.
There's also a RailsCast about it - Sidekiq.
Take a look at the gem whenever at https://github.com/javan/whenever.
It allows you to schedule tasks like cron jobs.
Works very well under linux, and the last commit was 14 days ago. A friend of mine used it in a project and was pretty satisfied with it.
edit: take a look at the gem delayed_job as well, it is good for executing long tasks in the background. Useful when creating a cron job only to start other tasks.

How reliable is windows task scheduler for scheduling code to run repeatedly?

I have a bit of code that needs to sit on a windows server 2003 machine and run every minute.
What is the recommended way of handling this? Is it ok to design it as a console service and just have the task scheduler hit it ever minute? (is that even possible?) Should I just suck it up and write it as a windows service?
Since it needs to run every single minute, I would suggest writing a Windows Service. It is not very complicated, and if you never did this before, it would be great for you to learn how it is done.
Calling the scheduled task every minute is not something I would recommend.
I would say suck it up and write it as a Windows service. I've not found scheduled tasks to be very reliable and when it doesn't run, I have yet to find an easy way to find out why it hasn't.
Windows Scheduled Tasks has been fairly reliable for our purposes and we favor them in almost all cases over Windows Services due to their ease of installing and advanced recovery features. The always on nature of a windows service could end up being a problem if a part of the code that was written ends ups getting locked up or looped in a piece of code that it shouldn't be in. We generally write our code in a fashion similar to this
Init();
Run();
CleanUp();
Then as part of the Scheduled Task we put a time limit on how long the process can run and have it kill the process if it runs for longer. If we do have a piece of code that is having trouble Scheduled Tasks will kill it and the process will start up in the next minute.
if you need to have it run every minute, I would build it as a windows service. I wouldn't use the scheduler for anything less than a daily task.
I would say that it depends on what it was doing, but in general I am always in favor of having the fewest layers. If you write it as a console service and use the task scheduler then you have two places to maintain going forward.
If you write it as a windows service then you only have one fewer places to check in case something goes wrong.
While searching for scheduled service help, i came across to a very good article by Jon Galloway.
There are various diadvantages if a windows service is used for scheduled task. I agreed with it. I would suggest to use Task Scheduled, simple in implementation. Please refer to detailed information of implementing the task scheduler. Hope this info helps in finalizing the implementation approach.
The only other point to consider, is that if you're job involves some kind of database interaction, consider looking into the integration/scheduling services provided by your database.
For example, creating an SSIS package for your SQL Server related service may seem a bit like overkill, but it can be integrated nicely with the environment and will have its own logging/error checking mechanisms already in place.
I agree, it is kind of a waste of effort to create even a console executable and schedule it to be run every minute. I would suggest exploring something like Quartz.Net. That way you can create a simple job and schedule it to run every minute.

Spinning Background Tasks in Rails

What is the preferred way to create a background task for a Rails application? I've heard of Starling/Workling and the good ol' script/runner, but I am curious which is becoming the defacto way to manage this need?
Thanks!
Clarification: I like the idea of Rake in Background, but the problem is, I need something that is running constantly or every 10 hours. I am not going to have the luxury of sitting on a web request, it will need to be started by the server asynchronous to the activities occurring on my site.
Ryan Bates created three great screencasts that might really help you:
Rake in Background
Starling and Workling
Custom Daemon
He talks about the various pros and cons for using each one. This should help you get started.
It depends on your needs.
Try out delayed_job, which was created by Tobi delayed_job (last updated 2011), a Shopify founder.
There are forks by DHH deleayed_job (last updated 2008), and collectiveidea delayed_job (last updated 20 days ago as of 6/28/2018).
I usually rely on cronjob scheduling as it gives the flexibility without having to write separate code to schedule it. Anything that can be executed from shell, can be scheduled! Be it any script (ruby / rake task / py / bash / any other you like), cronjob scheduling can be easily achieved.
If running on windows, one can use scheduled tasks
Hope this helps.
async_observer is the best. It doesn't do all kinds of dumb busy wait stuff or lose jobs on worker crashes like starling, no DB polling, etc... and it integrates into rails remarkably well.
I push tons of jobs through it and it pretty much doesn't care.
Most of the plugins that have been mentioned will do the job, but if all you need is a Rake task run on a set schedule, then there's really no need to start throwing more architecture at it.
Just add a cron job which executes
"cd /path/to/rails/app; RAILS_ENV=production rake run:my:task"
Why reinvent the wheel, when Unix like operating systems have been running tasks on a schedule for decades?
I have used the daemons plugin in the past.
While I don't know if it is becoming a standard, I have had great success with BackgroundRB. I have several workers, some are long running tasks triggered by a user action while others are started on a schedule.
Have a look at Taskr. It's basically like cron, but with a RESTful web interface. You can use it to schedule tasks to periodically connect to your Rails app and trigger arbitrary code (via the Taskr4rails plugin). It's meant to fit nicely into a system built around RESTful services, plus it can notify you if a task returns an error, fails to run, etc.

How to do a rolling restart of a cluster of mongrels

Anybody know a nice way to restart a mongrel cluster via capistrano in a "rolling" style, eg, one mongrel at a time. Would be great to have a bit of wait time in there as well for each, to let the mongrel load the rails app up as well.
I've done some searching, and haven't found too much, so looking for help before I dive into the mongrel_cluster gem myself.
Thanks!
I agree with the seesaw approach more than the rolling approach you are seeking. The problem is that you end up in situations where load balancing can throw users back and forth between different versions of the application while you are transitioning.
The solutions we came up with (before finding SeeSaw, which we don't use) was to take half of the mongrels off line from the load balancer. Shut them down. Update them. Start them up. Put those mongrels back online in the load balancer and take the other half off. Shut the second half down. Update the second half. Start them up. This greatly minimizes the time where you have two different versions of the application running simultaneously.
I wrote a windows bat file to do this. (Deploying on Windows is not recommended, btw)
It is very important to note that having database migrations can make the whole approach a little dangerous. If you have only additive migrations, you can run those at any time before the deployment. If you are removing columns, you need to do it after the deployment. If you are renaming columns, it is better to split it into a create a new column and copy data into it migration to run before deployment and a separate script to remove the old column after deployment. In fact, it may be dangerous to use your regular migrations on a production database in general if you don't make a specific effort to organize them. All of this points to making more frequent deliveries so each update is lower risk and less complex, but that's a subject for another response.
Seesaw is a gem found in the Rails Oceania Rubyforge Project that provides this kind of functionality to mongrel clusters. However, the project may be suffering from some bit-rot not havain had a release since 2007. Still worth a look even just to pinch the ideas :)
#!/bin/bash
for PIDFILE in /tmp/mongrel.*; do
PID=$(cat ${PIDFILE})
kill ${PID}
${RUN_MONGREL_CMD} ${PID}
sleep 2
done

Resources