Uploading & Unzipping files to S3 through Rails hosted on Heroku? - ruby-on-rails

I'd like to be able to upload a zip file to my Rails application that contains a number of images. Then I'd like Rails to unzip that file and attach the images inside to my Photo's model via Paperclip, so that they are ultimately stored on my Amazon S3 account (configured through Paperclip).
I'd like do do this all on my Rails site hosted on Heroku, which unfortunately doesn't allow local storage of any kind (so far as I'm aware) to temporarily do the unzipping before the Paperclip parsing.
How would I do this??

I would recommend uploading directly to S3 which bypasses Heroku entirely so you're not restricted to the 30 second request timeout they enforce (which drops your uploads after that time is hit) or the 1gb /tmp directory limit. After the file is uploaded, you can make a POST to your Rails app with the file's name and location and then do your unzipping operation. If you'd like to use Paperclip for post-processing, I have attached a link below. If you end up going the route of uploading directly to S3 which offloads the work from your Rails server, please check out my sample projects:
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
Here is the link for the Paperclip post processing for an example like images:
http://www.railstoolkit.com/posts/fancyupload-amazon-s3-uploader-with-paperclip

dmagkic is correct about the rails_root/tmp. I recommend something like the following:
Upload files through heroku to S3
Setup a background job to zip the files (store the file names that you need to group)
run the BJ that downloads the files from S3, zips them, sends the zip to S3, removes the unzipped files.
That way your application will still be responsive'ish during the upload process.
If you try to upload multiple files, you COULD write to /tmp, but just make sure that all the files come across in the same post request.

Heroku does allow writing to #{RAILS_ROOT}/tmp.
But you need to take in mind that file will be there only as long as request lasts. Probably longer, but that is not guaranteed. You could try to block request while you unzip and send to S3, but you should take care of the time it takes.
It sounds to me like you need some flash uploader that can unzip and send to S3, without Heroku.

Related

Unzip a file an upload all its content to S3 using Rails on Heroku

I am building an API in Rails, and one of the calls will get a zip file containing a static HTML site, upload it to S3, and return the public URL.
What's the best way to approach this? I'm thinking of unzipping the file in Heroku's tmp directory and using s3_uploader to upload all its content on S3.
Is it worth using Carrierwave?
Nobody answered, so I'll quickly explain how I approached this issue.
I'm using Carrierwave to upload the zip file to S3, and I'm calling the unzip method asynchronously with delayed_job gem. This is pretty well explained on Heroku's dev center page.
The unzip method downloads and extracts the S3 zip file on Heroku's tmp folder, then uploads it back to S3 using the s3_uploader gem. This works quite nicely, the only thing that I have to sort out at some point is deleting the unzipped folder when a model entity gets deleted.

Paperclip: migrating from file system storage to Amazon S3

I have a RoR website, where users can upload photos. I use paperclip gem to upload the photos and store them on the server as files. I am planning to move to Amazon S3 for storing the photos. I need to move all my existing photos from server to Amazon S3. Can someone tell me the best way for moving the photos. Thanks !
You'll want to log into your AWS Console and create a bucket structure to facilitate your images. Neither S3 nor Paperclip have any tools in the way of bulk migrations from file system -> s3, you'll need to use the tool s3cmd for that. In particular, you're interested in the s3cmd sync command, something along the lines of:
s3cmd sync ./public/system/images/ s3://imagesbucket
If you have any image urls hard-coded into your database (a la markdown/template code) this might be a little tricky. One option would be to manually update your urls to point to the new bucket. Alternatively, you can rack-rewrite.
You can easily do this by creating a bucket on Amazon S3 that has the same folder structure as your public directory on your Rails app.
So say for instance, you create a new bucket on Amazon S3 called MyBucket and it has a folder in it called images. You'd just move all of your images within your Rails app's images folder over to that new bucket's images folder.
Then you can set up your app to use an asset host like this answer describes: is it good to use S3 for Rails "public/images" and there an easy way to do it?
If you are using image_tag or other tag helpers (javascripts, stylesheets, etc), then it will use that asset_host for production environments and properly generate the URL to your S3 bucket.
I found this script which takes care of moving the images to Amazon S3 bucket using rake task.
https://gist.github.com/924617

Using Paperclip to direct upload files to S3

so I've got paperclip set up with uploadify to upload things to S3. I have made my setup so that stuff gets loaded directly to S3 and then when it's done I post to my webserver the results...
All I get back is the file name and size. am I supposed to build my own processor or before_post_process method to "download" the file from S3 in order to process it? or am I missing something and uploadify should have provided me a stream with the file inside it after it was done posting to S3?
How do you guys go about direct uploads to S3 and then notifying your paperclip backed model? Do you have to pull files from the server and do post-processing on them or will paperclip handle all of that?
Here are a couple blog posts describing how to do it...
http://www.railstoolkit.com/posts/uploading-files-directly-to-amazon-s3-using-fancyupload
http://www.railstoolkit.com/posts/fancyupload-amazon-s3-uploader-with-paperclip
They use FancyUploader (which uses MooTools/Flash) to upload directly to S3, bypassing Heroku and their dreaded 30 second request timeout all together, and then use DelayedJob to queue up post-processing tasks like thumbnailing and PaperClip to do the actual processing of the files.
If I can get this working with CarrierWave, I will post up a project on GitHub to share (in a week or so once I get time)
Update:
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
I will add the post-processing example once I have time.
You can either create a processor or use the callback methods but the file will definitively be on your server before going to S3.
If you are in the callback method for example you can access it using something like:
self.file.to_file
Once that is done processing and uploading the file will be deleted from your server. You don't need to do anything to notify or post process. Paperclip will handle it.

Uploading to s3, using s3 servers

Does anyone have any sample code (preferrably in rails) that uploads to s3, using s3's servers.
Again, uploading directly to s3, where the actual upload/streaming is also preformed on amazon's servers.
Requirements:
Plupload, jQuery
Idea:
Authorize Upload via your app (sign it on server-side)
Use the signed request to upload the file to S3
Notify your app that the upload is done
Check whether S3 has received the file
I posted the code as a gist at https://gist.github.com/759939, it misses commments and you might run into some issues due to missing methods (had to rip it from our codebase).
stored_file.rb contains a model for your DB. Has many of paperclips helper methods inlined (which we used before we switched to direct upload to S3).
I hope you can use it as a sample to get your stuff running.
If you are using Rails 3, please check out my sample projects:
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
To simply copy files, this is easy to use:
Smart Copy Script into S3
Amazon wrote a Ruby library for the S3 REST API. I haven't used it yet.
http://amazon.rubyforge.org/

Large file download for a Rails project

One client project will be online two months later. One of the requirements changed is to support large files (10 to 15MB per RAW camera file, expected 1000 to 5000 files download per day) download worldwide for their customers. The process will be:
there is upload screen via paperclip to the rails local public folder
a hourly task to upload to web storage (S3?)
update the download url from paperclip url to the web url
Questions:
is there a gem/plug-in for this
purpose?
if no, any gem/plug-in
for S3 to recommend?
Questions about the storage provider:
is S3 recommended?
or other service to recommend?
The baseline is: the client's web server does not and will not have the bandwidth to handle the downloads.
Thanks
I don't think there is anything that will do all of this out of the box for you. Paperclip will push files sychronousy to S3 on upload, so you will need to make this ansychronous yourself.
S3 is rock-solid, I have used it in production on a number of projects. Totally recommended.
You can upload files directly to S3 which may help by reducing the double handling of the file (no longer need to upload to your app before pushing to Amazon):
http://developer.amazonwebservices.com/connect/entry.jspa?categoryID=139&externalID=1434
The aws-s3 and delayed_job gems are probably what you want.
gem install aws-s3
S3 is popular and widely used as far as I am aware.
If you end up going the route of uploading directly to S3 which offloads the work from your Rails server and makes it asynchronous, please check out my sample projects:
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload

Resources