Intercept Paperclip upload before S3 - ruby-on-rails

I have a Rails 4.1.1 app with file upload thru Paperclip to Amazon S3. I'd like to do some processing to my file when it's uploaded, and so I'd like to perform this processing before the file actually is sent to S3, so that everything happens faster, otherwise I'd have to upload the file, then download it, then process it.
So, how can I create a file, somewhere in my tmp/ folder for processing, from the form submitted by the user?
Any help would be appreciated, I could find no reference on the web for such a need.
Thanks in advance

Images are uploaded to your application before being stored in S3. This allows your models to perform validations and other processing before being sent to S3.
So I would go with Paperclip::Processor (a custom one) or Paperclip callbacks like before_post_process (usually for validation stuff).
I think this and this articles are very enlightening.

Related

Rails, Amazon S3 storage, CarrierWave-Direct and delayed_job - is this right?

I've just discovered that Heroku doesn't have long-term file storage so I need to move to using S3 or similar. A lot of new bits and pieces to get my head around so have I understood how direct upload to S3 using CarrierWave-direct and then processing by delayed_job should work with my Rails app?
What I think should happen if I code this correctly is the following:
I sign up to an S3 account, set-up my bucket(s) and get the authentication details etc that I will need to program in (suitably hidden from my users)
I make sure that direct upload white lists don't stop cross-domain from preventing my uploads (and later downloads)
I use CarrierWave & CarrierWave-direct (or similar) to create my uploads to avoid loading up my app during uploads
S3 will create random access ('filename') information so I don't need to worry about multiple users uploading files with the same name and the files getting overwritten; if I care about the original names I can use metadata to store them.
CarrierWave-direct redirects the users browser to an 'upload completed' URL after the upload from where I can either create the delayed_job or popup the 'sorry, it went wrong' notification.
At this point the user knows that the job will be attempted and they move on to other stuff.
My delayed_job task accesses the file using the S3 APIs and can delete the input file when completed.
delayed_job completes and notifies the user in the usual way e.g. an e-mail.
Is that it or am I missing something? Thanks.
You have a good understanding of the process you need. To throw one more layer of complexity at you---you should wrap all of it in rails new(er) ActiveJob. ActiveJob simply facilities background processing inside rails via the processor of your choosing (in your case DelayedJobs). Then, you can create Jobs via a rails generator:
bin/rails g job process_this_thing
Active Jobs offers a few "rails way" of handling jobs...but, it also allows you to switch processors with less hassle.
So, you create a carrierwave uploader (see carrierwave docs). Then, attach that uploader to a model. For carrierwave_direct you need to disassociate the file field from your models form and move the file field to its own form (use the form url method provided by carrierwave-direct).
You can choose to upload the file, then save the record. Or, save the record and then process the file. The set-up process is significantly different depending on which you choose.
Carrierwave and carrierwave-direct know where to save the file based on the fog credentials you put in the carrierwave initializer and by using the store_dir path, if set, in the uploader.
Carrierwave provides the uploader, which define versions, etc. Carrierwave_direct facilities uploading direct to your S3 bucket and processing versions in the background. Active Jobs, via DelayedJobs, provides the background processing. Fog is the link between carrierwave and your S3 bucket.
You should add a boolean flag to your model that is set to true when carrierwave_direct uploads your image and then set to false when the job finishing processing the versions. That way, instead of a broken link (while the job is running and not yet complete) your view will show something like 'this thing is still processing...'.
RailsCast is the perfect resource for completing this task. Check this out: https://www.youtube.com/watch?v=5MJ55_bu_jM

Rails, Heroku, S3, and static resources

I am working on a Rails web application, running on a Heroku stack, that handles looking after some documents that are attached to a Rails database object. i.e. suppose we have an object called product_i of class/table Product/products, and product_i_prospectus.pdf is the associated product prospectus, where each product has a single prospectus.
Since I am working on Heroku, and thus do not have root access, I plan to use Amazon S3 to store the static resource associated with product_i. So far, so good.
Now suppose that product_i_attributes.txt is also a file I want to upload, and indeed I want to actually fill out information in the product_i object (i.e. the row in the table corresponding to product_i), based on information in the file product_i_attributes.txt.
In a sentence: I want to create, or alter, database objects, based on the content of static text files uploaded to my S3 bucket.
I don't actually have to be able to access them once they are in the bucket strictly speaking, I just need to create some stuff out of a text file.
I have done something similar with csv files. I would not try to process the file directly at upload as it can be resource intensive.
My solution was to upload the file to s3 and then call a background job method(delayed_job, resque, etc.) that processed the csv after upload. You could then call a delete after the job processed to remove the file from s3 if you no longer needed it after processing.
For Heroku this will require that you add a worker (if you don't already have one) to process the background jobs that will process the text files.
Take a look at the aws-sdk-for-ruby gem. This will allow you to access your S3 bucket.

Mananging upload of images to create custom pdfs on heroku - right tools

Im desiging an app which allows users to upload images (max 500k per image, roughly 20 images) from their hard drive to the site so as to be able to make some custom boardgames (e.g. snakes and ladders) in pdf formate. These will be created with prawn instantly and then made available for instant download.
Neither the images uploaded nor the pdfs created need to be saved on my apps side permanently. The moment the user downloads the pdf they are no longer needed.
Heroku doesn't support saving files to the system (it does allow to the tmp directory but says you shouldnt rely on it striking it out for me). I'm wondering what tools / services I should be looking into to get round this. Ive looked into paperclip, I'm wondering if this is right for this type of job.
Paperclip is on the right track, but the key insight is you need to use the S3 storage backend (Paperclip uses the FS by default which as you've noticed is no good on Heroku). It's pretty handy; instead of flushing writes out to the file system, it uses the AWS::S3 gem to upload them to S3. You can read more about it in the rdoc here: http://github.com/thoughtbot/paperclip/blob/master/lib/paperclip/storage/s3.rb
Here's how the flow would work:
I'd let your users upload their multiple source images. Here's an article on allowing multiple attachments to one model with paperclip: http://www.cordinc.com/blog/2009/04/multiple-attachments-with-vali.html.
Then when you're ready to generate the PDF (probably in a background job, right?), what you do is download all the source images to somewhere in tmp/ (make sure the directory is based on your model id or something so if two people do this at once, the files don't get stepped on). Once you've got all the images downloaded, you can generate your PDF. I know this is using the file system, but as long as you do all your filesystem interactions in one request or job cycle, it will work, your files will still be there. I use this method in a couple production web apps. You can't count on tmp/ being there between requests, but within one it's reliably there.
Storing your generated PDF on S3 with paperclip makes sense too, since then you can just hand your users the S3 URL. If you want you can make something to clear the files off every so often if you don't want to pay the S3 costs, but they should be trivial.
Paperclip sounds like an ideal candidate. It will save images in RAILS_ROOT/public/system/, which is both persistent and private (shouldn't be able to be enumerated on shared hosting).
You can configure it to produce thumbnails of your images if you wish.
And it can remove the images it manages when the associated model is destroyed - after your user downloads their PDF, and you delete the record from the database.
Prawn might not be appropriate, depending on the complexity of the PDFs you need to generate. If you have $$$, go for PrinceXML and the princely gem. I've had some success with wkhtmltopdf, which generates PDFs from a Webkit render of HTML/CSS - but it doesn't support any of the advanced page manipulation stuff that Prince does.

How can I prevent double file uploading with Amazon S3?

I decided to use Amazon S3 for document storage for an app I am creating. One issue I run into is while I need to upload the files to S3, I need to create a document object in my app so my users can perform CRUD actions.
One solution is to allow for a double upload. A user uploads a document to the server my Rails app lives on. I validate and create the object, then pass it on to S3. One issue with this is progress indicators become more complicated. Using most out-of-the-box plugins would show the client that file has finished uploading because it is on my server, but then there would be a decent delay when the file was going from my server to S3. This also introduces unnecessary bandwidth (at least it does not seem necessary)
The other solution I am thinking about is to upload the file directly to S3 with one AJAX request, and when that is successful, make a second AJAX request to store the object in my database. One issue here is that I would have to validate the file after it is uploaded which means I have to run some clean up code in S3 if the validation fails.
Both seem equally messy.
Does anyone have something more elegant working that they would not mind sharing? I would imagine this is a common situation with "cloud storage" being quite popular today. Maybe I am looking at this wrong.
Unless there's a particular reason not to use paperclip I'd highly recommend it. Used in conjunction with delayed job and delayed paperclip the user uploads the file to your server filesystem where you perform whatever validation you need. A delayed job then processes and stores it on s3. Really, really easy to set up and a better user experience.

How can you send a file to S3 after all processing is done using paperclip in rails?

I have a rails app with Video and Image models. Both use SWFUpload for progress indication feedback and queued uploading. So they are uploaded to a TempImage, and TempVideo model then when the ActiveRecord Video and Image models are saved the temps are moved over.
On the images the different styles are created with the default paperclip processor. On the videos after they are uploaded they are queued in the background (using starling & workling) to be transcoded to FLV format and have a jpg thumbnail created.
So my question is this: I want to be able to do all these conversions on the local server, but I'd like the files to be stored on S3 in the end to preserve space and bandwidth on my server. How can I use the S3 backend for paperclip to do this? Or instead should I have a background task that does the upload to S3 independently of paperclip after all the after_save tasks are done which updates the paperclip attributes to reflect the new S3 path?

Resources