Heroku and background uploading in ruby - ruby-on-rails

Currently I have an application that is uploading images to S3 in a background (Sidekiq) task. It works fine, however I have had to "hack" together a solution and was curious of anyone knew of a better way to do this.
Problem:
When using Paperclip and a background job on Heroku, the worker is most often times not able to access the tmp file because it is spun up on a different server. I have tried to have paperclip use the tmp folder on Heroku, and it stores it there, however the background tasks have always returned a "File not found".
Temp solution:
This results in having to encode the image to a base64 string and pass that into the perform task (disgusting, bad, horrible, large overhead).
Is there a better way to do this on Heroku? I don't want to save an image blob into the database, as that is just as bad of a practice.

Would it be possible to use the Direct upload approach in the Heroku S3 guide? And then have some background job to resize or process if needed?

Related

Passing huge JSON to Sidekiq Jobs

One of the features of the application which I am currently working on is photo upload. Customers upload photos in frontend and photos are passed to rails backed and then stored on Amazon S3.
I have noticed that a huge amount of request time is spent uploading photos to s3. The photos are uploaded one by one so latency is multiplied. It would be great if I could somehow store photos temporarily in RAM and increase request speed.
I have thinked about running a Sidekiq job with a file as params but according to sidekiq documentation passing a huge object is not good practise. How can I solve this in another vay ?
I think this problem by using an API to generate a presigned url and using cognito to upload the image on s3 and get the image link.
nginx/puma running on machine A should save the image as a local file. Run Sidekiq on the same machine A and pass the filename to a job in a host-specific queue for Sidekiq to process. That way you can pass a file reference without worrying which machine will process it.
Make sure Sidekiq deletes the file so you don't fill up the disk!
https://www.mikeperham.com/2013/11/13/advanced-sidekiq-host-specific-queues/

Does anyone know if it's possible to integrate carrierwave backgrounder (store_in_background) with Heroku?

https://github.com/lardawge/carrierwave_backgrounder
I would like to use store_in_background method for delaying storing the files to the S3 but I'm little bit afraid as Heroku is read only system and wondering if anyone has managed that ?
It would work iff you're using Heroku's newer stack which offers an ephemeral file system. I'd recommend something like queue_classic instead of carrierwave_backgrounder.
Queue Classic uses Postgres-specific features to deliver great performance. It also has the advantage of being able to be modified by postgres triggers/procedures. This allows you to queue an image delete when an image row is deleted in ONE query.

Timeout error from unicorn while uploading a file

I'm using unicorn on Heroku. one of the issues I'm having is with file uploads. We use carrierwave for uploads, and basically, even for a file that's about 2MB size, by the time 50-60% upload is done, Unicorn times out.
We aren't using unicorn when we test locally, and I don't have any issues with large files locally (though the files get uploaded to AWS using carrierwave, just as with production + staging). However, on staging & production servers, I see that we get a timeout.
Any strategies on fixing this issue? I'm not sure I can put this file upload on a delayed job (because I need to confirm to my users that the file has indeed been successfully uploaded).
Thanks!
Ringo
If you're uploading big files to S3 via Heroku, you can't reasonably avoid timeouts. If someone decides to upload a large file, it's going to time out. If it takes longer than 30s to upload to Heroku, transfer to S3, and process, the request will time out. For good reason too, a 30s request is just crappy performance.
This blog post (and github repo) is very helpful: http://pjambet.github.io/blog/direct-upload-to-s3/
With it, you should be able to get direct-to-s3 file uploads working. You completely avoid hitting Heroku for the bulk of the upload. Using jquery-fileupload's callbacks, you can post to your application after the file is successfully uploaded, and process it in the background using delayed_job. Confirming to your users that the upload is successful is an application problem you just need to take care of.
Sounds like your timeout is set too low. What does your unicorn config look like?
See https://devcenter.heroku.com/articles/rails-unicorn for a good starting point.

How can I upload images in bulk (about 100-200 images at a time) in Ruby on Rails?

I want to upload images (around 200 kB each) in bulk. We have multiple options like as CarrierWave, Paperclip, and others. How can I perform these uploads in bulk?
Like other things in computer Sc, the answer is it depends™. What I really mean is
End-Users going to be uploading these?. If yes use jQuery File Upload plugin to present an easy to use interface.
For storage, you can storage it on your server. or even better, upload the images directly from users computers to amazon s3. here is an example of Uploading Files to S3 in Ruby with Paperclip
Obviously you will need to convert this into background job where all images get images with ajax in separate jobs. if you already don't have a fav que system, I would suggest resque or sidekiq
Note: if you choose to upload images directly to S3 via CORS, then your rails server is free need from managing file uploads. And direct uploads are suggested if you have large images or large number of files to be uploaded.
However, direct uploads limit your ability to modify the images(resize etc). so keep that in mind if you are choosing the direct upload solution.
TL;DR
Don't. Rails isn't optimized for bulk uploads, so do it out-of-band whenever you can.
Use FTP/SFTP
The best way to deal with large volumes of files is to use an entirely out-of-band process rather than tying up your Rails process. For example, use FTP, FTPS, SCP, or SFTP to upload your files in bulk. Once the files are on a server, post-process them with a cron job, or use inotify to kick off a rake task.
NB: Make sure you pay attention to file-locking when you use this technique.
Use a Queue
If you insist on doing it via Rails, don't upload hundreds of files. Instead, upload a single archive containing your files to be processed by a background job or queue. There are many alternatives here, including Sidekiq and RabbitMQ among others.
Once the archive is uploaded and the queued job submitted, your queue process can unpack the archive and do whatever else needs to be done. This type of solution scales very well.

Multiple Uploads to Amazon S3 from Ruby on Rails - What Background Processing System to Use?

I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
http://aaronvb.com/blog/2009/7/19/paperclip-amazon-s3-background-upload-using-starling-and-workling
EDIT:
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.

Resources