Uploading large files with Paperclip - ruby-on-rails

I want to be able to routinely upload about 3x 20mb files to a specific type of record in a Rails app. So far I'm using Paperclip and I get a pretty average success rate and a lot of EOF (bad content body) errors.
What can I do to improve the situation? Googling rails large upload doesn't turn much up.

Uploading large files, while slow, should succeed. The EOF (bad content body) error seems unrelated.
What are you using to upload the files? A standard multipart web form? And is anything unusual happening with the server configuration? Or are you using POW? Apparently POW users have reported similar issues.
For large files, I typically use S3 for storage and would recommend uploading directly to S3.

Related

What is the best way to create a large ZIP file with Rails data and send it to Amazon S3 storage?

I have a feature in a Ruby on Rails application that backs up all of a user's account data so they can download it in a ZIP file and store it locally.
To create the ZIP file, I'm doing the following:
Using ZIP::OutputStream to open a ZIP file stream.
Going through each of the relevant models in the users account, converting all the records in that model to a CSV, then adding each CSV to the ZIP file.
Sending the resulting ZIP file to AWS S3.
Here is some pseudo code to illustrate the process:
output_stream = Zip::OutputStream.write_buffer do |zos|
#models_to_backup.each do |model|
csv = model.convert_to_csv_file
zos.put_next_entry("csv_files/#{model.name}.csv")
zos.write csv
end
end
output_stream.rewind
SendFileToS3(output_stream)
This works fine for smaller files, but most users have upwards of 100,000 records. Thus, as the ZIP::OutputStream is generating, I quickly run into memory issues (I'm hosting the app on Heroku) because the output stream is all being stored in memory until it's sent.
Is there a more memory efficient way to create these ZIP files? Is there a way to stream the ZIP to S3 in batches as it's created, to avoid creating the entire ZIP file in memory? Or, will I just need to provision a higher memory-limit server to accomplish this?
Answering my own question in case anyone sees this in the future. After looking into this for several days, I couldn't find a great solution.
My somewhat hacky workaround is to just temporarily use a Heroku Performance-L dyno (which has 14GB of memory) when I'm running the backups (only has to happen once a month.) Probably not the most elegant solution, but it works.

Is the chunking option required with plupload and asp.net MVC?

I have seen various posts where developers have opted for the chunking option to upload files, particularly large files.
It seems that if one uses the chunking option, the files are uploaded and progressively saved to disk, is this correct? if so it seems there needs to be a secondary operation to process the files.
If the config is set to allow large files, should plupload work without chunking up to the allowed file size for multiple files?
It seems that if one uses the chunking option, the files are uploaded
and progressively saved to disk, is this correct ?
If you mean "automatically saved to disk", as far as I know, it is not correct. Your MVC controller will have to handle as many requests as there are chunks, concatenate each chunk in a temp file, then rename the file after handling the last chunk.
It is handled this way in the upload.php example of plupload
if so it seems there needs to be a secondary operation to process the
files.
I'm not sure I understand this (perhaps you weren't meaning "automatically saved to disk")
If the config is set to allow large files, should plupload work
without chunking up to the allowed file size for multiple files ?
The answer is yes... and no.... It should work, then fail with some combination of browsers / plupload runtimes when size comes around 100 MB. People also seem to encounter problems to setup the config.
I handle small files (~15MB) and do not have to use chunking.
I would say that if you are to handle large files, chunking is the way to go.

Rails - uploading big images to Heroku

When I upload an image (on Amazon S3 servers) to a Heroku app from a camera, where the photos have let's say more than 2.5MB, sometimes the app is not processed within 30 seconds and I see on the screen the warning message about Application Error.
This is not very user-friendly behavior, how to avoid this acting? I now I can buy an additional dyno, but I am not sure that this is a solution. For file upload I use Paperclip gem.
How do you solve this situation, when users uploads let's say images bigger than 3MB?
There's a couple things you could do: (in order from best to worst bet)
If you have a need to do a lot of post-procesing on images (like resizing them) you can have all of that processing done on a worker dyno using "Delayed Jobs." This way you get a response back much faster, but your alternate or resized versions of the image aren't immediately available, only :original. There's a tutorial on it here: http://madeofcode.com/posts/42-paperclip-s3-delayed-job-in-rails
You could use Unicorn or one of it's cousins. While it likely won't fix the image upload issue by itself, it allows you to adjust how long it takes for a request to timeout. As an added bonus it should also speed up the rest of your app.
You could try using Carrier Wave with CarrierWaveDirect instead of paperclip. This is kind of a shot in the dark as I've never personally used it, but it's supposedly 'better,' which could mean faster? maybe? It sounds like it works in a similar way as Paperclip with delayed jobs.

speed up uploads from Dragonfly to S3

(Rails 3.2 on Heroku)
For handling image uploads in Rails I switched from Paperclip to Dragonfly because I like to be able to generate thumbnails dynamically, when they are requested for the first time.
However, it seems that uploading of attached files to S3 (using S3DataStore) is much slower than with Paperclip
This is how an upload looks in a NewRelic transaction trace:
Anyone have experience in speeding this up?
That's a really surprising benchmark; is the server doing the file uploading on EC2 and also in the same region as your S3 bucket? How big are the generated thumbnails?
Those questions aside, doing any kind of thumbnail generation during a response is probably not a great idea: it'll add some time to each page load where thumbnails need to be generated, and even if that time isn't 3 seconds it'll still be something. I'd process images asynchronously with a gem like Delayed Paperclip. Though you won't save on storage space, as you would with CarrierWave, your response times will be vastly improved.

Why is Performing Multi-part Uploads to S3 on iOS not supported?

Problem statement:
I want to upload a large binary (such as an audio clip) from an iOS app to S3, and I'd like to make the app's handling of disconnects (or low connectivity) as robust as possible, preferably by uploading the binary as a series of chunks.
Unfortunately, neither the AWSiOS SDK, nor ASI's S3 framework seem to support to multi-part uploads, or indicate that they plan to add support. I realize that I can initiate a 'longish' upload using beginBackgroundTaskWithExpirationHandler: and that'll give me a window of time to complete the upload (currently 600 seconds, I believe), but what's to be done if I'm not in a situation to complete said upload within that timeframe?
Aside from worrying about completing tasks within that time frame, is their a 'best practice' for how an app should resume uploads, or even just break a larger upload into smaller chunks?
I've thought about writing a library to talk to S3's REST API specifically for multi-part uploads, but this seems like a problem other have either been solved, or realized needn't be solved (perhaps for being completely in appropriate for the platform).
Another (overly complicated) solution would be chunking the file on the device, uploading those to S3 (or elsewhere) and have them re-assembled on S3 via a server process. This seems even more unpalatable than rolling my own library for multi-part upload.
How are others handling this problem?
Apparently I was looking at some badly out of date documentation.
in AmazonS3Client see:
- (S3MultipartUpload * AmazonS3Client)initiateMultipartUploadWithKey:(NSString *)theKey withBucket:(NSString *)theBucket
Which will give you a S3MultipartUpload which will contain an uploadId.
You can then put together an S3UploadPartRequest using initWithMultipartUpload: (S3MultipartUpload *) multipartUpload and send that as you usually would.
S3UploadPartRequest contains an int property partNumber where you can specify the part # you're uploading.
you can write some code to do so, you can refer code from http://dextercoder.blogspot.in/2012/02/multipart-upload-to-amazon-s3-in-three.html. Core java code, steps can be used for iOS.

Resources