Asynchronous task system on top of Rails? - ruby-on-rails

I'm currently managing a legacy rails application that's running on rails 1.2.7. One of the functionalities is allowing people to upload sounds and have them converted using a command line tool using backticks. Currently I use an AJAX poll to do the management of the conversion through a controller action, but I'm having issues with timeouts, meaning that the final elements of the controller action simply are not occurring.
It's a system that requires low overheads, what could I use to manage this background conversion and then respond in an evented system to issues created by the background conversion? I was looking at eventmachine but I'm still not 100% with it, are there any other kind of asynchronous task based systems I could use?

First of all, wow, Rails 1.2.7. I have a similarly aged app at work which I'm slowly upgrading to Rails 3. Crazy how fast this stuff changes.
Definitely a fun problem. There are lots of directions you could take this, and I'm not sure which is best, as I'm not sure I understand your process. So I'll suggest a couple. My understanding is 1) Upload file, 2) Start conversion, 3) Report on conversion status via ajax polling.
First, running the conversion utility in a Rails controller action is definitely not the way to go, as you've discovered. 1) Your Web server or browser will probably kill the request, and 2) most Rails deployments allow for only 1 request at a time per app, meaning if you want 5 simultaneous users uploading, you need 5 copies of your app running. Obviously that won't scale.
Your "upload" action should be as quick as possible. It should 1) upload the file, and 2) either schedule or fire off a "conversion job", which some other process would handle. Your polling action would then just report on the status of that job. Of course the question is what that other process should be.
Idea 1
http://geekblog.vodpod.com/2007/08/17/background-processing-in-rails/ is likely a good place to start, though I cannot vouch for that approach.
Idea 2
I've done something similar to this, so I can give more detail. And it probably scales better. Build a light-weight companion app using Sinatra or Async Sinatra. Your Rails app would record a job for the uploaded file in your database, but then its part is done. Your Sinatra app, using EventMachine, would poll the db every few seconds and start new jobs. You might want to limit it to n concurrent jobs so you don't DOS your own box :) Your users could then poll your Sinatra app to get their conversion's status.
Idea 3
Similar to 2, but instead of a companion Web app, it's just a small Ruby program using EventMachine. You would just start this program on your server and let it run forever. Each job would write its status back to the database, which your users could poll through your Rails app. I think this is my favorite. Wireframe:
#!/usr/bin/env ruby
require 'rubygems'
require 'eventmachine'
# Returns new jobs from the database
def new_jobs
[]
end
# Convert the file
def convert(job)
`convert #{job.path}`
job
end
# Callback when conversion is complete
def callback(job)
puts "Finished #{job.path}!"
end
EventMachine::run do
# Run every 5 seconds
EventMachine::add_periodic_timer(5) do
new_jobs.each { |job| EventMachine::defer convert(job), callback(job) }
end
end
These suggestions are admittedly from a 10,000 ft. view, but I hope there's something in there to get you started.

Related

How to process data on the server with rails and heroku

I am developing a website using Ruby on Rails and I am doing a bit of rough planning. I can and have deployed rails websites before, just adding to the database and retrieving from database based on my use case, but this time around, its a bit different. I am adding to database but i will need the data to be processed on the server before the data is being sent back to the user or when he decides to retrieve it. What i do not get is how i am going to process the data on the server. I know this doesnt follow the normal pattern for asking questions, i would search for it with google except I dont know what I am looking for. A nudge in the right direction will do.
What I want to do exactly is have users register and click a button (request) which puts the users id in an array , what I need to do on the server is to randomly or not randomly connect two users based on some qualities, this program keeps running infinitely, such that the user can come back later to check if he has been connected with someone already.
This kind of logic typically belongs in the controller, or perhaps on the models. You should read the Rails docs, particularly on controllers: http://guides.rubyonrails.org/action_controller_overview.html
I think you may find a lot of benefit from running a background job for this that is constantly looking for matches. You could have an infinitely running Sidekiq process that is queued up with users. Then once one finishes, just fire it up again.
Or you could create a rake task that does a User.find_each and have it run again when the task finishes. But this would make things blocking if you end up having a lot of users. I'd recommend one job per user and just bloat the system with them. This way you can scale out both horizontally and vertically.
You'll want to learn about ActiveJob and Sidekiq to help achieve what you're looking for :). Sidekiq requires Redis which you'll also have to setup as well. I'd recommend the redis-rails gem to help with the integration.
To go off BenMorganIO's answer, I think this is a job for a background worker. This is a job that is processed in the background, so it doesn't slow up your app. A good example of this is firing off an email in the background.
There are primarily 3 gems I've seen for this:
delayed_job
Resque
Sidekiq (just celebrated 5 years!)
Those should point you in the right direction.

Rails ActiveJob for infinite loop

The application I am developing needs to have an infinite loop to handle business logic that is completely separate from user input (they would only view it). Since this is a break from tradition MVC, I thought an Active Job would be a good place to put it.
The logic in the loop would poll microcontrollers on the same network. I have little to no ability to change the code on these so I have to adapt to the unique protocol they are using. When the microcontrollers respond, the server will need to do some calculations and store those in the database.
The job would be launched when the server application is. Only one instance of it should exist so I don't want to put it in my models or controllers. I have tried launching it from a couple of places in the config folder but it would cause an initialized constant NameError.
What would be the correct way to launch a job when the server is initialized? Is there another approach you would take?
I am a webdev newb using Ruby 2.2.0 and Rails 4.2.0.
How about using scheduled jobs? For example you could schedule it so that every minute a job is scheduled to run that will poll for whatever you need to be polling for?
Have a look at this Gem:
https://github.com/javan/whenever

rails periodic task

I have a ruby on rails app in which I'm trying to find a way to run some code every few seconds.
I've found lots of info and ideas using cron, or cron-like implementations, but these are only accurate down to the minute, and/or require external tools. I want to kick the task off every 15 seconds or so, and I want it to be entirely self contained within the application (if the app stops, the tasks stop, and no external setup).
This is being used for background generation of cache data. Every few seconds, the task will assemble some data, and then store it in a cache which gets used by all the client requests. The task is pretty slow, so it needs to run in the background and not block client requests.
I'm fairly new to ruby, but have a strong perl background, and the way I'd solve this there would be to create an interval timer & handler which forks, runs the code, and then exits when done.
It might be even nicer to just simulate a client request and have the rails controller fork itself. This way I could kick off the task by hitting the URI for it (though since the task will be running every few seconds, I doubt I'll ever need to, but might have future use). Though it would be trivial to just have the controller call whatever method is being called by the periodic task scheduler (once I have one).
I'd suggest the whenever gem https://github.com/javan/whenever
It allows you to specify a schedule like:
every 15.minutes do
MyClass.do_stuff
end
There's no scheduling cron jobs or monkeying with external services.
Generally speaking, there's no built in way that I know of to create a periodic task within the application. Rails is built on Rack and it expects to receive http requests, do something, and then return. So you just have to manage the periodic task externally yourself.
I think given the frequency that you need to run the task, a decent solution could be just to write yourself a simple rake task that loops forever, and to kick it off at the same time that you start your application, using something like Foreman. Foreman is often used like this to manage starting up/shutting down background workers along with their apps. On production, you may want to use something else to manage the processes, like Monit.
You can either write you own method, something like
class MyWorker
def self.work
#do you work
sleep 15
end
end
run it with rails runner MyWorker.work
There will be a separate process running in the background
Or you can use something like Resque, but that's a different approach. It works like that: something adds a task to the queue, meanwhile a worker is fetching whatever job it is in the queue, and tries to finish it.
So that depends on your own need.
I know it is an old question. But maybe for someone this answer could be helpful. There is a gem called crono.
Crono is a time-based background job scheduler daemon (just like Cron) for Ruby on Rails.
Crono is pure Ruby. It doesn't use Unix Cron and other platform-dependent things. So you can use it on all platforms supported by Ruby. It persists job states to your database using Active Record. You have full control of jobs performing process. It's Ruby, so you can understand and modify it to fit your needs.
The awesome thing with crono is that its code is self explained. In order to do a task periodically you can just do:
Crono.perform(YourJob).every 2.days
Maybe you can also do:
Crono.perform(YourJob).every 30.seconds
Anyway you really can do a lot of things. Another example could be:
Crono.perform(TestJob).every 1.week, on: :monday, at: "15:30"
I suggest this gem instead of whenever because whenever uses Unix Cron table which not always is available.
Throwing out a solution just because it looks somewhat elegant and answers the question without any extra gems. In my scenario I wanted to run some code, but only after all my Sidekiq workers were done doing their thing.
First I defined a method to check if any workers were working...
def workers_working?
workers = Sidekiq::Workers.new.map do |_process_id, _thread_id, work|
work
end
workers.size > 0
end
Then we just call the method with a loop which sleeps between calls.
sleep 5 while workers_working?
Use something like delayed job, and requeue it every so often?
Use thin or other server which uses eventmachine, then just use timers that are part of eventmachine. Example: in config/application.rb
EM.add_periodic_timer(2) do
do_this_every_2_sec
end

Best rails solution for a mailer that runs every minute

I have an application that checks a database every minute for any emails that are supposed to be sent out at that time. I was thinking about making this a rake task that would be run by a cron job every minute. Would there be a better solution to this?
From what I have read, this isn't ideal because rake has to load the entire rails environment every minute and this becomes expensive.
Thoughts?
Thanks.
You can use backgroundrb. This, however, will eat up memory away from your main Rails app as it will spawn one Ruby instance exclusive to backgroundrb.
You can also define a SystemController (or equivalent) in your main application, with various actions corresponding to the various household tasks your application should perform. You can "prod" it from crontab using wget or curl, the advantage being that it shares resources with your main application. Depending on how paranoid or you are, or on how vulnerable to DOS (or other types of attacks) exposing such a controller to, possibly, the outside world, you may choose to block access to this controller's URL from addresses other than the loopback (ideally in your reverse proxy, alternatively from the controller itself.)
One really simple method would be to have a script that does..
while true do
check_and_send_messages()
sleep 60
end
..which means you are not constantly respawning the Rails environment.
Obviously it has various flaws, but also has some benefits (for example, with your 1-Rake-per-minute, it the Rake task takes more than one minute, Rake will be running multiple times at once)
Also, the Railscasts episodes Rake in Background, Starling and Workling, and Custom Daemon might give you some ideas (they are describing exactly this task)
It turns out there's actually something built just for this: ar_mailer. ar_mailer queues up the e-mails into the DB and then sends them out periodically using the ar_mailer command. You can call ar_mailer every minute.
The nice thing about ar_mailer is that it basically requires very little change in terms of how you already send e-mails. You just need to inherit from ar_mailer instead of ActiveMailer. Using this method, you won't have to worry about running rake tasks in the background, forking processes, or anything like that - and in effect you get a real mail server with queued messages that are deleted when the mail is actually sent. This feature is important if you have a system that sends out large numbers of e-mail enmass. I've used ar_mailer to build a social network - so I can attest to its robustness.
Here's a good article that talks about ar_mailer in depth. I would strongly advise against rolling your own solution here as Eric has built a time-tested solution to this very problem.
I do what Vlad suggested (#2), with only local requests honored, and I'm paranoid enough to also require a specific query string tacked on to the url.
I have several periodic actions set up this way.

Best practice for Rails App to run a long task in the background?

I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.

Resources