I am using carrierwave to upload Video files and encoding that videos using Transloadit. What is the best way to do this in rails with delayed job. please suggest.

There are a number of options (purely the HTTP side of things, not even talking about programming languages).
The question is really depending on your environment(s), skills, support etc. What I've noticed in the 'real world' is that if you send >100MB to a server over an HTTP request, it will fail. Your clients most likely have really bad upload speeds (most soho internet connections are >10M down but <1M up) so you'll eventually hit a timeout (router/nat tables/firewall/web server/scripts).
1) Really large POST (bad practice, could potentially consume a lot of memory, failure means you have to start all over and leaves your server open to DDoS)
2) Using an 'upload module' for Apache/nginx (requires compilation and generally a lot of headache to get it set up but it works well, may not work with all hosting situations)
3) Streaming within your client and server scripts. Works well. I would also recommend chunking your uploads to <10MB and when they fail, the possibility of restarting chunks.

I'm not much experienced with DJ, but backgroud processing have similar approach with all tools.
First you should just upload your file somewhere(filesystem, Amazon S3, whatever).
DJ will not handle this task. You should do it at your controller action.
Then, after upload you can create DJ task, that encodes your video and does other related tasks.
For example, you can run DJ after commit in your video model, like
class Video < AR::Base
after_commit :encode_in_background
def encode_in_background
def encode
# code that runs in background
My example can have incorrect syntax, but the main idea is that you upload video via controller and then run background processing job.


How to stream downloads with rack.hijack and IO without them cutting off prematurely?

Our UI server needs to make certain downloadable files available to users. The files live on a specialized storage server. The UI server enforces some permissions and uses HTTP Basic Auth user/pwd to get the files off the storage server, but the user should never know these storage credentials so the files are streamed through our UI server to users.
We need to stream these through our UI server running Rails 4.2.11 and Ruby 2.4.9. Up until now we've been assigning an enumerator to the response body like Ruby on Rails 3: Streaming data through Rails to client , but it has been quite unreliable and we've had many cut off or incomplete downloads.
It appeared that the rack-hijack method ( https://blog.chumakoff.com/en/posts/rails_sse_rack_hijacking_api )method was newer and we hoped more reliable, but it also cuts off downloads when slower clients connect to download and it is a large file.
It appears that IO buffers are filling up and creating EOF or other errors since the download is streamed faster from the storage server than it is usually streamed through the UI by users.
I'm getting the rack-hijack IO stream (PhusionPassenger::Utils::UnseekableSocket) and writing the results of the chunks from the GET response to it.
I've tried many different http client gems and they all have similar problems, I think because it is filling the output buffer when slow clients download, not because they have problems, though the errors look that way.
When http clients have problems in this scenario they variously complain of either EOF or error reading from socket: Could not parse data entirely (1 != 0).
I've tried different versions of Passenger and tried various general solutions.
I've looked at a similar question at Rails: How to send file from S3 to remote server . In the answer, he suggests using <obj>.stat.size to check on the buffer and not have it overflow and have it match the user's download speed more. However, the stream I'm given by rack-hijack doesn't seem to have any way to check the size or buffer that I can find (PhusionPassenger::Utils::UnseekableSocket). If it exposed a way to do this, I could likely do this.
I've looked through the Passenger documentation for buffering information and changed some buffering settings but it didn't help.
I've looked into the methods on the stream I get from Passenger and also on the .to_io settings but I'm having a hard time finding tools I think I need and having trouble finding much documentation.
If I match downloads speeds from both ends then no problems. Stuff like wget --limit-rate=2m http://localhost:3000/download_test/stream1 allows me to throttle the speed of the client and I usually get an error around 180MB into the download.
It's running on Amazon Linux but runs either much better or without many problems on a local Mac development machine. It's hard to tell exactly if it's still problematic or just not as much since the problem can be a little dependent on size and other factors.
# a simplified version of the code, using a public URL
def stream1
url = 'https://www.spacetelescope.org/static/archives/images/publicationtiff40k/heic1502a.tif'
response.headers['Content-Type'] = 'image/tiff'
response.headers['Content-Disposition'] = 'attachment; filename="funn.tif"'
response.headers["X-Accel-Buffering"] = 'no'
response.headers["Cache-Control"] = 'no-cache'
response.headers["Last-Modified"] = Time.zone.now.ctime.to_s
response.headers["rack.hijack"] = proc do |stream|
Thread.new do
response = HTTP.get(url)
response.body.each do |chunk|
rescue HTTP::Error => ex
logger.error("while streaming: #{ex}")
logger.error("while streaming: #{ex.backtrace.join("\n")}")
head :ok
The results I'm looking for are to make downloads reliable no matter what (reasonable) speed of the client downloading from the UI.
Thanks for any help you can give since I'm feeling stuck.

What is the best approach to handle large file uploads in a rails app?

I am interested in understanding the different approaches to handling large file uploads in a Rails application, 2-5Gb files.
I understand that in order to transfer a file of this size it will need to be broken down into smaller parts, I have done some research and here is what I have so far.
Server-side config will be required to accept large POST requests and probably a 64bit machine to handle anything over 4Gb.
AWS supports multipart upload.
HTML5 FileSystemAPI has a persistent uploader that uploads the file in chunks.
A library for Bitorrent although this requires a transmission client which is not ideal
Can all of these methods be resumed like FTP, the reason I dont want to use FTP is that I want to keep in the web app if this is possible? I have used carrierwave and paperclip but I am looking for something that will be able to be resumed as uploading a 5Gb file could take some time!
Of these approaches I have listed I would like to undertand what has worked well and if there are other approaches that I may be missing? No plugins if possible, would rather not use Java Applets or Flash. Another concern is that these solutions hold the file in memory while uploading, that is also a constraint I would rather avoid if possible.
I've dealt with this issue on several sites, using a few of the techniques you've illustrated above and a few that you haven't. The good news is that it is actually pretty realistic to allow massive uploads.
A lot of this depends on what you actually plan to do with the file after you have uploaded it... The more work you have to do on the file, the closer you are going to want it to your server. If you need to do immediate processing on the upload, you probably want to do a pure rails solution. If you don't need to do any processing, or it is not time-critical, you can start to consider "hybrid" solutions...
Believe it or not, I've actually had pretty good luck just using mod_porter. Mod_porter makes apache do a bunch of the work that your app would normally do. It helps not tie up a thread and a bunch of memory during the upload. It results in a file local to your app, for easy processing. If you pay attention to the way you are processing the uploaded files (think streams), you can make the whole process use very little memory, even for what would traditionally be fairly expensive operations. This approach requires very little actual setup to your app to get working, and no real modification to your code, but it does require a particular environment (apache server), as well as the ability to configure it.
I've also had good luck using jQuery-File-Upload, which supports good stuff like chunked and resumable uploads. Without something like mod_porter, this can still tie up an entire thread of execution during upload, but it should be decent on memory, if done right. This also results in a file that is "close" and, as a result, easy to process. This approach will require adjustments to your view layer to implement, and will not work in all browsers.
You mentioned FTP and bittorrent as possible options. These are not as bad of options as you might think, as you can still get the files pretty close to the server. They are not even mutually exclusive, which is nice, because (as you pointed out) they do require an additional client that may or may not be present on the uploading machine. The way this works is, basically, you set up an area for them to dump to that is visible by your app. Then, if you need to do any processing, you run a cron job (or whatever) to monitor that location for uploads and trigger your servers processing method. This does not get you the immediate response the methods above can provide, but you can set the interval to be small enough to get pretty close. The only real advantage to this method is that the protocols used are better suited to transferring large files, the additional client requirement and fragmented process usually outweigh any benefits from that, in my experience.
If you don't need any processing at all, your best bet may be to simply go straight to S3 with them. This solution falls down the second you actually need to do anything with the files other than server them as static assets....
I do not have any experience using the HTML5 FileSystemAPI in a rails app, so I can't speak to that point, although it seems that it would significantly limit the clients you are able to support.
Unfortunately, there is not one real silver bullet - all of these options need to be weighed against your environment in the context of what you are trying to accomplish. You may not be able to configure your web server or permanently write to your local file system, for example. For what it's worth, I think jQuery-File-Upload is probably your best bet in most environments, as it only really requires modification to your application, so you could move an implementation to another environment most easily.
This project is a new protocol over HTTP to support resumable upload for large files. It bypass Rails by providing its own server.
http://www.jedi.be/blog/2009/04/10/rails-and-large-large-file-uploads-looking-at-the-alternatives/ has some good comparisons of the options, including some outside of Rails.
Please go through it.It was helpful in my case
Also another site to go to is:-
Please let me know if any of this does not work out
I would by-pass the rails server and post your large files(split into chunks) directly from the browser to Amazon Simple Storage. Take a look at this post on splitting files with JavaScript. I'm a little curious how performant this setup would be and I feel like tinkering with this setup this weekend.
I think that Brad Werth nailed the answer
just one approach could be upload directly to S3 (and even if you do need some reprocessing after you could theoretical use aws lambda to notify your app ... but to be honest I'm just guessing here, I'm about to solve the same problem myself, I'll expand on this later)
if you use carrierwave
Uploading large files on Heroku with Carrierwave
Let me also pin down few options that might help others looking for a real world solution.
I have a Rails 6 with Ruby 2.7 and the main purpose of this app is to create a Google drive like environment where users can upload images and videos and them process them again for high quality.
Obviously we did tried using local processing using Sidekiq background jobs but it was overwhelming during large uploads like 1GB and more.
We did tried tuts.io but personally I think is not quite easy to setup just like Jquery File uploads.
So we experimented with AWS..moving in steps listed below and it worked like a charm....uploading directly to S3 from the browser.
using React drop zone uploader...we uploads multiple files to S3.
we setup Aws Lambda for an input bucket to get triggered for all types of object creations on that bucket.
this Lambda converts the file and again uploads the reprocessed one to another one - output bucket and notifies us using Aws SNS to keep a track of what worked and what failed.
in Rails side... we just dynamically use the new output bucket and then serve it with Aws Cloud-front distribution.
You may check Aws notes on MediaConvert to check step by step guide and they also have a well written Github repos for all sorts of experimentation.
So, from the user's point of view, he can upload one large file, with Acceleration enabled on the S3, the React library show uploading progress and once it gets uploaded, Rails callback api again verifies its existence in the S3 BUCKET like mybucket/user_id/file_uploaded_slug and then its confirmed to user through a simple flash message.
You can also configure Lambda to notify end user on successful upload/encoding, if needed.
Refer this documentation - https://github.com/mike1011/aws-media-services-vod-automation/tree/master/MediaConvert-WorkflowWatchFolderAndNotification
Hope it helps someone here.

Asynchronous task system on top of Rails?

I'm currently managing a legacy rails application that's running on rails 1.2.7. One of the functionalities is allowing people to upload sounds and have them converted using a command line tool using backticks. Currently I use an AJAX poll to do the management of the conversion through a controller action, but I'm having issues with timeouts, meaning that the final elements of the controller action simply are not occurring.
It's a system that requires low overheads, what could I use to manage this background conversion and then respond in an evented system to issues created by the background conversion? I was looking at eventmachine but I'm still not 100% with it, are there any other kind of asynchronous task based systems I could use?
First of all, wow, Rails 1.2.7. I have a similarly aged app at work which I'm slowly upgrading to Rails 3. Crazy how fast this stuff changes.
Definitely a fun problem. There are lots of directions you could take this, and I'm not sure which is best, as I'm not sure I understand your process. So I'll suggest a couple. My understanding is 1) Upload file, 2) Start conversion, 3) Report on conversion status via ajax polling.
First, running the conversion utility in a Rails controller action is definitely not the way to go, as you've discovered. 1) Your Web server or browser will probably kill the request, and 2) most Rails deployments allow for only 1 request at a time per app, meaning if you want 5 simultaneous users uploading, you need 5 copies of your app running. Obviously that won't scale.
Your "upload" action should be as quick as possible. It should 1) upload the file, and 2) either schedule or fire off a "conversion job", which some other process would handle. Your polling action would then just report on the status of that job. Of course the question is what that other process should be.
Idea 1
http://geekblog.vodpod.com/2007/08/17/background-processing-in-rails/ is likely a good place to start, though I cannot vouch for that approach.
Idea 2
I've done something similar to this, so I can give more detail. And it probably scales better. Build a light-weight companion app using Sinatra or Async Sinatra. Your Rails app would record a job for the uploaded file in your database, but then its part is done. Your Sinatra app, using EventMachine, would poll the db every few seconds and start new jobs. You might want to limit it to n concurrent jobs so you don't DOS your own box :) Your users could then poll your Sinatra app to get their conversion's status.
Idea 3
Similar to 2, but instead of a companion Web app, it's just a small Ruby program using EventMachine. You would just start this program on your server and let it run forever. Each job would write its status back to the database, which your users could poll through your Rails app. I think this is my favorite. Wireframe:
#!/usr/bin/env ruby
require 'rubygems'
require 'eventmachine'
# Returns new jobs from the database
def new_jobs
# Convert the file
def convert(job)
`convert #{job.path}`
# Callback when conversion is complete
def callback(job)
puts "Finished #{job.path}!"
EventMachine::run do
# Run every 5 seconds
EventMachine::add_periodic_timer(5) do
new_jobs.each { |job| EventMachine::defer convert(job), callback(job) }
These suggestions are admittedly from a 10,000 ft. view, but I hope there's something in there to get you started.

Multiple Uploads to Amazon S3 from Ruby on Rails - What Background Processing System to Use?

I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.

Best practice for Rails App to run a long task in the background?

I have a Rails application that unfortunately after a request to a controller, has to do some crunching that takes awhile. What are the best practices in Rails for providing feedback or progress on a long running task or request? These controller methods usually last 60+ seconds.
I'm not concerned with the client side... I was planning on having an Ajax request every second or so and displaying a progress indicator. I'm just not sure on the Rails best practice, do I create an additional controller? Is there something clever I can do? I want answers to focus on the server side using Rails only.
Thanks in advance for your help.
If it matters, the http request are for PDFs. I then have Rails in conjunction with Ruport generate these PDFs. The problem is, these PDFs are very large and contain a lot of data. Does it still make sense to use a background task? Let's assume an average PDF takes about one minute to two minutes, will this make my Rails application unresponsive to any other server request during this time?
Edit 2:
Ok, after further investigation, it seems my Rails application is indeed unresponsive to any other HTTP requests after a request comes in for a large PDF. So, I guess the question now becomes: What is the best threading/background mechanism to use? It must be stable and maintained. I'm very surprised Rails doesn't have something like this built in.
Edit 3:
I have read this page: http://wiki.rubyonrails.org/rails/pages/HowToRunBackgroundJobsInRails. I would love to read about various experiences with these tools.
Edit 4:
I'm using Passenger Phusion "modrails", if it matters.
Edit 5:
I'm using Windows Vista 64 bit for my development machine; however, my production machine is Ubuntu 8.04 LTS. Should I consider switching to Linux for my development machine? Will the solutions presented work on both?
The Workling plugin allow you to schedule background tasks in a queue (they would perform the lengthy task). As of version 0.3 you can ask a worker for its status, this would allow you to display some nifty progress bars.
Another cool feature with Workling is that the asynchronous backend can be switched: you can used DelayedJobs, Spawn (classic fork), Starling...
I have a very large volume site that generates lots of large CSV files. These sometimes take several minutes to complete. I do the following:
I have a jobs table with details of the requested file. When the user requests a file, the request goes in that table and the user is taken to a "jobs status" page that lists all of their jobs.
I have a rake task that runs all outstanding jobs (a class method on the Job model).
I have a separate install of rails on another box that handles these jobs. This box just does jobs, and is not accessible to the outside world.
On this separate box, a cron job runs all outstanding jobs every 60 seconds, unless jobs are still running from the last invocation.
The user's job status page auto-refreshes to show the status of the job (which is updated by the jobs box as the job is started, running, then finished). Once the job is done, a link appears to the results file.
It may be too heavy-duty if you just plan to have one or two running at a time, but if you want to scale... :)
Calling ./script/runner in the background worked best for me. (I was also doing PDF generation.) It seems like the lowest common denominator, while also being the simplest to implement. Here's a write-up of my experience.
A simple solution that doesn't require any extra Gems or plugins would be to create a custom Rake task for handling the PDF generation. You could model the PDF generation process as a state machine with states such as submitted, processing and complete that are stored in the model's database table. The initial HTTP request to the Rails application would simply add a record to the table with a submitted state and return.
There would be a cron job that runs your custom Rake task as a separate Ruby process, so the main Rails application is unaffected. The Rake task can use ActiveRecord to find all the models that have the submitted state, change the state to processing and then generate the associated PDFs. Finally, it should set the state to complete. This enables your AJAX calls within the Rails app to monitor the state of the PDF generation process.
If you put your Rake task within your_rails_app/lib/tasks then it has access to the models within your Rails application. The skeleton of such a pdf_generator.rake would look like this:
namespace :pdfgenerator do
desc 'Generates PDFs etc.'
task :run => :environment do
# Code goes here...
As noted in the wiki, there are a few downsides to this approach. You'll be using cron to regularly create a fairly heavyweight Ruby process and the timing of your cron jobs would need careful tuning to ensure that each one has sufficient time to complete before the next one comes along. However, the approach is simple and should meet your needs.
This looks quite an old thread. However, what I have down in my app, which required to run multiple Countdown Timers for different pages, was to use Ruby Thread. The timer must continue running even if the page was closed by users. Ruby makes it easy to write multi-threaded programs with the Thread class. Ruby threads are a lightweight and efficient way to achieve parallelism in your code. I hope this will help other wanderers who is looking to achieve background: parallelism/concurrent services in their app. Likewise Ajax makes it a lot easier to call a specific Rails [custom] action every second.
This really does sound like something that you should have a background process running rather than an application instance(passenger/mongrel whichever you use) as that way your application can stay doing what it's supposed to be doing, serving requests, while a background task of some kind, Workling is good, handles the number crunching. I know that this doesn't deal with the issue of progress, but unless it is absolutely essential I think that is a small price to pay.
You could have a user click the action required, have that action pass the request to the Workling queue, and have it send some kind of notification to the user when it is completed, maybe an email or something. I'm not sure about the practicality of that, just thinking out loud, but my point is that it really seems like that should be a background task of some kind.
I'm using Windows Vista 64 bit for my
development machine; however, my
production machine is Ubuntu 8.04 LTS.
Should I consider switching to Linux
for my development machine? Will the
solutions presented work on both?
Have you considered running Linux in a VM on top of Vista?
I recommend using Resque gem with it's resque-status plug-in for your heavy background processes.
Resque is a Redis-backed Ruby library for creating background jobs,
placing them on multiple queues, and processing them later.
resque-status is an extension to the resque queue system that provides
simple trackable jobs.
Once you run a job on a Resque worker using resque-status extension, you will be able to get info about your ongoing progresses and ability to kill a specific process very easily. See examples:
status.pct_complete #=> 0
status.status #=> 'queued'
status.queued? #=> true
status.working? #=> false
status.time #=> Time object
status.message #=> "Created at ..."
Also resque and resque-status has a cool web interface to interact with your jobs which is so cool.
There is the brand new Growl4Rails ... that is for this specific use case (among others as well).
I use Background Job (http://codeforpeople.rubyforge.org/svn/bj/trunk/README) to schedule tasks. I am building a small administration site that allows Site Admins to run all sorts of things you and I would run from the command line from a nice web interface.
I know you said you were not worried about the client side but I thought you might find this interesting: Growl4Rails - Growl style notifications that were developed for pretty much what you are doing judging by the example they use.
I've used spawn before and definitely would recommend it.
Incredibly simple to set up (which many other solutions aren't), and works well.
Check out BackgrounDRb, it is designed for exactly the scenario you are describing.
I think it has been around for a while and is pretty mature. You can monitor the status of the workers.
It's a pretty good idea to develop on the same development platform as your production environment, especially when working with Rails. The suggestion to run Linux in a VM is a good one. Check out Sun xVM for Open Source virtualization software.
I personally use active_messaging plugin with a activemq server (stomp or rest protocol). This has been extremely stable for us, processing millions of messages a month.
