Non blocking rails file download - ruby-on-rails

The problem I have is to do with generating large reports. I am doing so using the prawn gem to got results however I was wondering if this could be migrated to a background process.
Since I am using faye for push notifications and sidekiq for background tasks, a potential solution would be to generate the report in a sidekiq worker and use faye to notify the client of the completion of the worker. The issue with this is I don't see a way of cleaning up the generated file elegantly. I don't think generating the file within the controller action is feasible as it leads to unreasonable loading times and blocks other requests.
Is this system possible? Or am I thinking about this in the wrong way?

You are right, this is a perfectly valid to do things.
I am not sure what you means about "cleaning up the generated file". If you mean deleting from the file system, you could do it in the controller who download it, and eventually add a daily cron job who remove all remaining files.
We used such system in various projects.
Another option if the generation is really long is to send an email (if you got it) once the report is generated with the report embedded or with a link to it.

Related

Rails app deploy using delayed job serving seemingly deleted files

I have a rails 4.2.3 app running on an ubuntu server with nginx, postgres, and puma and I'm using Capistrano for deployments.
Users of my app can send bulk emails out using several different email templates I provide and making use of the Delayed_Job gem. Recently, I updated two of the email templates and I've found that occasionally (maybe 20 times out of 500) an old version of the email template is sent out rather than the current version. I've combed through my application code and I'm satisfied that the old version of the template no longer exists anywhere in the application.
What's more, users are able to edit the template by providing their own body text, and, when an old version of an email template is sent, it is sent with the correct body text that the user specifies. It's as if there's an old version of my app running on the server that occasionally commandeers the sending of an email.
Is it possible that somehow as I update my deployment using Capistrano, the old application process remains running and sometimes starts working off the delayed_job queue? Capistrano only keeps the 5 most recent versions of my application though, and none of them have the old email template that is being used. So if this was the case the old application process would need to be entirely kept in memory--so this doesn't seem possible.
Anyone have any ideas I can pursue? I'm stumped as to what could be causing this (or how this problem is even possible). Thanks so much for any help!
(PS: emails make use of the premailer gem, though I don't see how that could be involved)
The answer to this question was yes, "Is it possible that somehow as I update my deployment using Capistrano, the old application process remains running and sometimes starts working off the delayed_job queue?"
I was very surprised to find that, when you start a delayed job process, it serializes / saves into memory everything it needs to complete its tasks. A.K.A it saves / creates a new instance of the whole application, for practical purposes. Old delayed job processes weren't being properly killed when I pushed changes to my app, so I had something like 20 delayed job processes going at once. For the most part, the most recent delayed job process would take over the queue. If it was busy, however, old processes would butt in and start working off jobs using an old version of my application.
To fix the problem I had to kill off the old delayed job processes and then update my Capistrano code to make sure it wouldn't happen again.
Big thanks to #Dharam for leading me down the correct path (it's taken me a while to post this answer)

Handling long running tasks

I have a web app which has a single long running task - generating a PDF report. Various graphs are generated, and it takes about 15 sec to process in all. The report is generated by a user.
Processing the report at the time of request currently causes the process to be tied up, and more importantly (given that use of this website is not heavy) sometimes the request times out.
I am therefore redesigning the architecture of this section of the app (Rails 2.3.8). To put this in context, it's unlikely that more than a couple of these reports will be generated per day, and this is an extremely niche application, so significant further scaling isn't a major concern. I do intend to hand off the project in the future though, so stability is.
The most obvious solution I think is to use Spawn to generate a report, and fire a download link to the user in an email once it's complete. Another solution I've looked into is DelayedJob.
Can anyone who's done something similar recommend one approach over another?
delayed_job, or some other queueing mechanism is going to be the easiest thing to set up. With delayed_job you would just enqueue your worker instead of creating the PDF, and a background process on the server would be working from the queue doing whatever work was available. Using spawn to fork your whole process seems a little heavy-handed, and doesn't seem to lend itself well to other minor, but still longer running tasks (like sending emails).

Multiple Uploads to Amazon S3 from Ruby on Rails - What Background Processing System to Use?

I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
http://aaronvb.com/blog/2009/7/19/paperclip-amazon-s3-background-upload-using-starling-and-workling
EDIT:
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.

Best rails solution for a mailer that runs every minute

I have an application that checks a database every minute for any emails that are supposed to be sent out at that time. I was thinking about making this a rake task that would be run by a cron job every minute. Would there be a better solution to this?
From what I have read, this isn't ideal because rake has to load the entire rails environment every minute and this becomes expensive.
Thoughts?
Thanks.
You can use backgroundrb. This, however, will eat up memory away from your main Rails app as it will spawn one Ruby instance exclusive to backgroundrb.
You can also define a SystemController (or equivalent) in your main application, with various actions corresponding to the various household tasks your application should perform. You can "prod" it from crontab using wget or curl, the advantage being that it shares resources with your main application. Depending on how paranoid or you are, or on how vulnerable to DOS (or other types of attacks) exposing such a controller to, possibly, the outside world, you may choose to block access to this controller's URL from addresses other than the loopback (ideally in your reverse proxy, alternatively from the controller itself.)
One really simple method would be to have a script that does..
while true do
check_and_send_messages()
sleep 60
end
..which means you are not constantly respawning the Rails environment.
Obviously it has various flaws, but also has some benefits (for example, with your 1-Rake-per-minute, it the Rake task takes more than one minute, Rake will be running multiple times at once)
Also, the Railscasts episodes Rake in Background, Starling and Workling, and Custom Daemon might give you some ideas (they are describing exactly this task)
It turns out there's actually something built just for this: ar_mailer. ar_mailer queues up the e-mails into the DB and then sends them out periodically using the ar_mailer command. You can call ar_mailer every minute.
The nice thing about ar_mailer is that it basically requires very little change in terms of how you already send e-mails. You just need to inherit from ar_mailer instead of ActiveMailer. Using this method, you won't have to worry about running rake tasks in the background, forking processes, or anything like that - and in effect you get a real mail server with queued messages that are deleted when the mail is actually sent. This feature is important if you have a system that sends out large numbers of e-mail enmass. I've used ar_mailer to build a social network - so I can attest to its robustness.
Here's a good article that talks about ar_mailer in depth. I would strongly advise against rolling your own solution here as Eric has built a time-tested solution to this very problem.
I do what Vlad suggested (#2), with only local requests honored, and I'm paranoid enough to also require a specific query string tacked on to the url.
I have several periodic actions set up this way.

Best practice for Rails App to run a long task in the background?

I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
Edit:
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
end
end
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
Resque-status
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
http://www.writebetterbits.com/2009/01/update-to-growl4rails.html
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.

Resources