I have an application hosted on Heroku. Part of it's job is to store / serve up files of varying sizes, zipped up in bundles. (We're planning to phase out the bundling process at a later date, but that's going to be a major revamp of consuming software)
The 5GB limit on file uploads to S3 (caused by S3 requiring multiple part uploads) is becoming increasingly untenable for our use-case. In fact, it's become an outright pain and outright unacceptable to the business model.
Rails 6.1 is supposed to fix this, but we can't wait for it to come out, especially since there isn't an ETA on it yet. I tried using the alpha version off master, and got hit with an error about not being able to load coffee script (which is weird since I don't use coffeescript).
I'm now trying to find other viable alternatives that will allow our application to store files of 5GB or larger. I'm experimenting with compressing the files, but that isn't a long-term solution either.
In my application I have around 400 images that need to be displayed at various times. There will be no user uploaded imagery. In other words, I control all the pictures being used within my application.
I'm wondering what the recommended route is. Would it be best to put all the images in app/assets/images or would it be better to upload all of them to a 3rd party service like AWS?
The application will eventually be living through Heroku. Thanks.
From this question (and first comment), your total compiled code and assets cannot exceed 100MB. As long as you keep under this, you'll be fine with Heroku. However, if you exceed that, or the number of files will change dramatically or consistently, I'd recommend Cloudinary, which gives you 500MB of FREE (file)storage and is available as a Heroku Add-on.
I am interested in understanding the different approaches to handling large file uploads in a Rails application, 2-5Gb files.
I understand that in order to transfer a file of this size it will need to be broken down into smaller parts, I have done some research and here is what I have so far.
Server-side config will be required to accept large POST requests and probably a 64bit machine to handle anything over 4Gb.
AWS supports multipart upload.
HTML5 FileSystemAPI has a persistent uploader that uploads the file in chunks.
A library for Bitorrent although this requires a transmission client which is not ideal
Can all of these methods be resumed like FTP, the reason I dont want to use FTP is that I want to keep in the web app if this is possible? I have used carrierwave and paperclip but I am looking for something that will be able to be resumed as uploading a 5Gb file could take some time!
Of these approaches I have listed I would like to undertand what has worked well and if there are other approaches that I may be missing? No plugins if possible, would rather not use Java Applets or Flash. Another concern is that these solutions hold the file in memory while uploading, that is also a constraint I would rather avoid if possible.
I've dealt with this issue on several sites, using a few of the techniques you've illustrated above and a few that you haven't. The good news is that it is actually pretty realistic to allow massive uploads.
A lot of this depends on what you actually plan to do with the file after you have uploaded it... The more work you have to do on the file, the closer you are going to want it to your server. If you need to do immediate processing on the upload, you probably want to do a pure rails solution. If you don't need to do any processing, or it is not time-critical, you can start to consider "hybrid" solutions...
Believe it or not, I've actually had pretty good luck just using mod_porter. Mod_porter makes apache do a bunch of the work that your app would normally do. It helps not tie up a thread and a bunch of memory during the upload. It results in a file local to your app, for easy processing. If you pay attention to the way you are processing the uploaded files (think streams), you can make the whole process use very little memory, even for what would traditionally be fairly expensive operations. This approach requires very little actual setup to your app to get working, and no real modification to your code, but it does require a particular environment (apache server), as well as the ability to configure it.
I've also had good luck using jQuery-File-Upload, which supports good stuff like chunked and resumable uploads. Without something like mod_porter, this can still tie up an entire thread of execution during upload, but it should be decent on memory, if done right. This also results in a file that is "close" and, as a result, easy to process. This approach will require adjustments to your view layer to implement, and will not work in all browsers.
You mentioned FTP and bittorrent as possible options. These are not as bad of options as you might think, as you can still get the files pretty close to the server. They are not even mutually exclusive, which is nice, because (as you pointed out) they do require an additional client that may or may not be present on the uploading machine. The way this works is, basically, you set up an area for them to dump to that is visible by your app. Then, if you need to do any processing, you run a cron job (or whatever) to monitor that location for uploads and trigger your servers processing method. This does not get you the immediate response the methods above can provide, but you can set the interval to be small enough to get pretty close. The only real advantage to this method is that the protocols used are better suited to transferring large files, the additional client requirement and fragmented process usually outweigh any benefits from that, in my experience.
If you don't need any processing at all, your best bet may be to simply go straight to S3 with them. This solution falls down the second you actually need to do anything with the files other than server them as static assets....
I do not have any experience using the HTML5 FileSystemAPI in a rails app, so I can't speak to that point, although it seems that it would significantly limit the clients you are able to support.
Unfortunately, there is not one real silver bullet - all of these options need to be weighed against your environment in the context of what you are trying to accomplish. You may not be able to configure your web server or permanently write to your local file system, for example. For what it's worth, I think jQuery-File-Upload is probably your best bet in most environments, as it only really requires modification to your application, so you could move an implementation to another environment most easily.
This project is a new protocol over HTTP to support resumable upload for large files. It bypass Rails by providing its own server.
http://tus.io/
http://www.jedi.be/blog/2009/04/10/rails-and-large-large-file-uploads-looking-at-the-alternatives/ has some good comparisons of the options, including some outside of Rails.
Please go through it.It was helpful in my case
Also another site to go to is:-
http://bclennox.com/extremely-large-file-uploads-with-nginx-passenger-rails-and-jquery
Please let me know if any of this does not work out
I would by-pass the rails server and post your large files(split into chunks) directly from the browser to Amazon Simple Storage. Take a look at this post on splitting files with JavaScript. I'm a little curious how performant this setup would be and I feel like tinkering with this setup this weekend.
I think that Brad Werth nailed the answer
just one approach could be upload directly to S3 (and even if you do need some reprocessing after you could theoretical use aws lambda to notify your app ... but to be honest I'm just guessing here, I'm about to solve the same problem myself, I'll expand on this later)
http://aws.amazon.com/articles/1434
if you use carrierwave
https://github.com/dwilkie/carrierwave_direct_example
Uploading large files on Heroku with Carrierwave
Let me also pin down few options that might help others looking for a real world solution.
I have a Rails 6 with Ruby 2.7 and the main purpose of this app is to create a Google drive like environment where users can upload images and videos and them process them again for high quality.
Obviously we did tried using local processing using Sidekiq background jobs but it was overwhelming during large uploads like 1GB and more.
We did tried tuts.io but personally I think is not quite easy to setup just like Jquery File uploads.
So we experimented with AWS..moving in steps listed below and it worked like a charm....uploading directly to S3 from the browser.
using React drop zone uploader...we uploads multiple files to S3.
we setup Aws Lambda for an input bucket to get triggered for all types of object creations on that bucket.
this Lambda converts the file and again uploads the reprocessed one to another one - output bucket and notifies us using Aws SNS to keep a track of what worked and what failed.
in Rails side... we just dynamically use the new output bucket and then serve it with Aws Cloud-front distribution.
You may check Aws notes on MediaConvert to check step by step guide and they also have a well written Github repos for all sorts of experimentation.
So, from the user's point of view, he can upload one large file, with Acceleration enabled on the S3, the React library show uploading progress and once it gets uploaded, Rails callback api again verifies its existence in the S3 BUCKET like mybucket/user_id/file_uploaded_slug and then its confirmed to user through a simple flash message.
You can also configure Lambda to notify end user on successful upload/encoding, if needed.
Refer this documentation - https://github.com/mike1011/aws-media-services-vod-automation/tree/master/MediaConvert-WorkflowWatchFolderAndNotification
Hope it helps someone here.
I need to write an application, which allows the user to upload large videos. Afaik, PHP stores the entire uploaded file into memory (at least per default), so you get problems with large files. Has Rails similar problems? I need to receive files up to 2 GB.
My setup will be:
Ruby 1.8.7
Rails 3.0
Passenger 3.0
Apache 2.2
Unless you recommend something else, I would give Paperclip a try.
Regards, Johannes
It's possible, and we have an commercial website which is currently handling uploads ~3GBs for long HD videos just fine with CarrierWave - a great alternative to Paperclip.
So long as you have Apache setup correctly to accept requests that large, you probably won't have the same issues that PHP applications and the like traditionally do with configuration hell needed to set the maximum request size and whatnot.
Read this for the caveats, though: http://www.therailsway.com/2009/4/23/uploading-files
Edit: For what it's worth, we're using Nginx + upload module (see https://github.com/vkholodkov/nginx-upload-module for info) to do this and avoid the issues in the above article; afaik Rails loads the entire uploaded file into memory when handling uploads normally, which means you're going to need to have a significant amount of memory unless you're using something like the mod_porter plugin mentioned in the above article.
Since Heroku is a read-only filesystem I can't use paperclip to store a small quantity of files on the server. Database image storage is an option, but not particularly ideal since that may crank my client's DB size up from a few hundred KB to over the 5 MB 'free' shared DB limit (depending on size of images).
That leaves Amazon S3 as a likely solution. I understand that Heroku is hosted on EC2 (I believe?). Amazon's pricing wording was a little bit confusing when referring to S3-EC2 file transfers. If I have my client setup an S3 account and let them do file transfers to and from there, what is the pricing going to look like?
Is it cheaper from an S3 point-of-view to to both upload and download data in the rails controllers, and then feed the data to the browser using send_file? Or would it make more sense to just link straight to the image or pdf from the browser like normal?
Would my client have to pay anything at all since heroku is hosted on Amazon? I was looking for other questions related to this but there weren't any really straight answers concerning which parts of the file transfer would be charged for.
I guess the storage would cost a little (hardly anything), but what about the bandwidth? Thanks :)
Is it cheaper from an S3 point-of-view
to to both upload and download data in
the rails controllers, and then feed
the data to the browser using
send_file? Or would it make more sense
to just link straight to the image or
pdf from the browser like normal?
From an S3 standpoint, yes, this would be free, because Heroku would be covering your transfer costs. HOWEVER: Heroku only lets a script run for 30 seconds, and during that time, other clients wont be able to load the site, so this is really a terrible idea. Your best bet is to serve the files out of S3 directly, in which case, yes your customer would be transfer between S3 and the end user.
Any interaction you have with the file from Heroku (i.e. metadata and what not) will be free because it is EC2->S3.
For most cases, your pricing would be identical to what it would be if you were not using heroku. The only case where this would change would be if your app is constantly accessing the data directly on S3 (to read metadata/load files)
You can use Paperclip on Heroku - just not the local file system for storage. Fortunately Paperclip can use s3 for storage. Heroku has a tech article here that covers it.
Also when an asset that's been uploaded is displayed on a page (lookup asset_host) the image would be loaded directly from your s3 buckets URL so you will pay Amazon for a get request to the image and then for data transfer involved but also for storing the assets on s3. Have you looked at the s3 calculator to get indicative costs?