Paperclip before upload callback - ruby-on-rails

I am working on a rails application that uses Timecop gem for "time traveling" and I have a problem... When I try to upload a file on Amazon S3 using Paperclip, S3 returns me
<Error><Code>RequestTimeTooSkewed</Code><Message>The difference between the request time and the current time is too large.</Message>
I think a "before_upload" callback on paperclip will help me to reset time to the real time, perform upload and travel back in the past... There is such a callback? What I found till now was just after_ callbacks :(

try doing it in the before_post_process callback, though I'm not sure how to dynamically add that hook without modifying your code specifically for testing. I would see if you could reset the time before you create your model, that would probably be the simplest.

Related

Rails, Amazon S3 storage, CarrierWave-Direct and delayed_job - is this right?

I've just discovered that Heroku doesn't have long-term file storage so I need to move to using S3 or similar. A lot of new bits and pieces to get my head around so have I understood how direct upload to S3 using CarrierWave-direct and then processing by delayed_job should work with my Rails app?
What I think should happen if I code this correctly is the following:
I sign up to an S3 account, set-up my bucket(s) and get the authentication details etc that I will need to program in (suitably hidden from my users)
I make sure that direct upload white lists don't stop cross-domain from preventing my uploads (and later downloads)
I use CarrierWave & CarrierWave-direct (or similar) to create my uploads to avoid loading up my app during uploads
S3 will create random access ('filename') information so I don't need to worry about multiple users uploading files with the same name and the files getting overwritten; if I care about the original names I can use metadata to store them.
CarrierWave-direct redirects the users browser to an 'upload completed' URL after the upload from where I can either create the delayed_job or popup the 'sorry, it went wrong' notification.
At this point the user knows that the job will be attempted and they move on to other stuff.
My delayed_job task accesses the file using the S3 APIs and can delete the input file when completed.
delayed_job completes and notifies the user in the usual way e.g. an e-mail.
Is that it or am I missing something? Thanks.
You have a good understanding of the process you need. To throw one more layer of complexity at you---you should wrap all of it in rails new(er) ActiveJob. ActiveJob simply facilities background processing inside rails via the processor of your choosing (in your case DelayedJobs). Then, you can create Jobs via a rails generator:
bin/rails g job process_this_thing
Active Jobs offers a few "rails way" of handling jobs...but, it also allows you to switch processors with less hassle.
So, you create a carrierwave uploader (see carrierwave docs). Then, attach that uploader to a model. For carrierwave_direct you need to disassociate the file field from your models form and move the file field to its own form (use the form url method provided by carrierwave-direct).
You can choose to upload the file, then save the record. Or, save the record and then process the file. The set-up process is significantly different depending on which you choose.
Carrierwave and carrierwave-direct know where to save the file based on the fog credentials you put in the carrierwave initializer and by using the store_dir path, if set, in the uploader.
Carrierwave provides the uploader, which define versions, etc. Carrierwave_direct facilities uploading direct to your S3 bucket and processing versions in the background. Active Jobs, via DelayedJobs, provides the background processing. Fog is the link between carrierwave and your S3 bucket.
You should add a boolean flag to your model that is set to true when carrierwave_direct uploads your image and then set to false when the job finishing processing the versions. That way, instead of a broken link (while the job is running and not yet complete) your view will show something like 'this thing is still processing...'.
RailsCast is the perfect resource for completing this task. Check this out: https://www.youtube.com/watch?v=5MJ55_bu_jM

Paperclip: copy_to_local_file called upon every update (of unrelated attributes)

I'm using paperclip 4.1.0 with Amazon S3.
I was wondering why requests were so slow and found that "copy_to_local_file" is called whenever I am updating attributes of a model with an attachment, even if it's just one attribute unrelated to the attachment (in my case, a cache_count, which means that every time someone votes an instance up, the attachment is downloaded locally!).
I understand it is used in case a rollback is required, but it seems overkill when the attribute isn't directly related to the attachment.
Am I using paperclip in the wrong way or is it something that could be improved ?
Thanks for your help
Just my 2 cents:
the attachment gets downloaded locally only after ActiveRecord::Base#save is called.
Would calling the 'base#save' in a cron on a daily basis instead help with the load?
Otherwise, either remove the calling of method copy_to_local_file if possible
or editing the source of paperclip's method of copy_to_local_file(style, local_dest_path) to exclude the downloading of attachment.
This was an issue with paperclip, it's fixed on the master branch!

How to use ORM (activerecord) with Carrierwave_direct?

I'm successfully using Carrierwave_direct - it mounts an uploader and uploads directly to S3 yay! H
owever, unlike Carrierwave it does not persist a record into the DB - rather it just redirects back to a 'success_path' (standard AWS/S3 function).
Before embarking on rolling my own solution I'm curious if anyone has figured this out or has a good approach for this. I would like it to upload directly to S3 and use carrierwave to persist the record to the db.
My immediate thoughts are to pass params to the process which get carried back to the app - then grap these params and create the record.
Appreciate any thoughts.
All you have to do is:
giving the page you want to go back on success in the new action of your controller: #uploader.success_action_redirect = 'Your_update_page'
Amazon will bring you back to this page on success and add a 'key' argument in which you will have the information you need to update the db.
This is very well explained on the github readme of carrierwave direct.

Amazon S3, how to deal with the delay from Upload to Object availability

The app I'm building allows a user to upload a file. The file is uploaded to Amazon S3 in a private bucket.
Then users can download the file, which we allow by creating a time expiring URL:
AWS::S3::S3Object.url_for(attachment.path(style || attachment.default_style), attachment.bucket_name, :expires_in => expires_in, :use_ssl => true)
The problem we're having is that there is a short delay from upload to availability via the AWS::S3::S3Object.url_for. If users try to download the file right after the upload, Amazon errors with:
215412-NameError (uninitialized constant Attachment::AWS):
215413- app/models/attachment.rb:32:in `authenticated_url'
215414- app/controllers/attachments_controller.rb:33:in `show'
Any ideas on how to optimize, deal with this delay?
Thanks
I know it's been years, but for those who came here with the same issue, here is what I've found.
First of all, it's just how AWS S3 works:
A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
The best way I have found to deal with this behaviour would be to wait while an uploaded object appears in the list before allow users to download it.
Something like:
_put_object(filename)
while True:
if _file_exists(filename):
break
time.sleep(1)
To check availability we can use client.head_object or client.list_objects_v2.
There is an opinion that list_objects_v2 works faster
How long of a delay are you seeing? How often is this happening?
We upload directly to s3 from the browser using https://github.com/PRX/s3-swf-upload-plugin , and by the time I get a callback that the file exists, I have never seen an error with it being not yet available.
Another thing we do is to mark the object to one state on first upload, then use an asycnh process to validate the file, and only after it is marked valid do we go ahead and process it.
This causes a delay however, so it may not be such a great answer for you.

How can I prevent double file uploading with Amazon S3?

I decided to use Amazon S3 for document storage for an app I am creating. One issue I run into is while I need to upload the files to S3, I need to create a document object in my app so my users can perform CRUD actions.
One solution is to allow for a double upload. A user uploads a document to the server my Rails app lives on. I validate and create the object, then pass it on to S3. One issue with this is progress indicators become more complicated. Using most out-of-the-box plugins would show the client that file has finished uploading because it is on my server, but then there would be a decent delay when the file was going from my server to S3. This also introduces unnecessary bandwidth (at least it does not seem necessary)
The other solution I am thinking about is to upload the file directly to S3 with one AJAX request, and when that is successful, make a second AJAX request to store the object in my database. One issue here is that I would have to validate the file after it is uploaded which means I have to run some clean up code in S3 if the validation fails.
Both seem equally messy.
Does anyone have something more elegant working that they would not mind sharing? I would imagine this is a common situation with "cloud storage" being quite popular today. Maybe I am looking at this wrong.
Unless there's a particular reason not to use paperclip I'd highly recommend it. Used in conjunction with delayed job and delayed paperclip the user uploads the file to your server filesystem where you perform whatever validation you need. A delayed job then processes and stores it on s3. Really, really easy to set up and a better user experience.

Resources