How to manage amazon s3 bucket size? - ruby-on-rails

I am using Amazon S3 for saving my uploads in my rails application.
But the bucket size is growing very rapidly, I have used kraken image optimizeer for compressing images.But i want to know what else i can do for managing bucket size.

This depends on what your use case is, if you always need access to the files etc. Optimizing / resizing uploads is probably a good idea, however you can also have a look at S3 lifecycle management. With this feature you can for example delete old files or move them to AWS Glacier. See the reference for an example on how to set this up using the AWS console.

Related

Carrierwave store images locally not on s3 at heroku

I am using carrier wave to upload images and display them in a photo-gallery. Carrier wave store files at public/uploads. But these images are not getting displayed at heroku. I found that heroku is read only and we should store files at s3.
Is there any other alternatives than s3?If yes, can you please share here?
Heroku is only read only if you're on the bamboo stack (old). For cedar, they use a ephemeral writeable filesystem, which means that whilst you can upload, it gets wiped with every deploy
S3 is not your only option; it's just Amazon's storage system. You've got dropbox, Azure, RackSpace & a bunch of others which provide similar functionality
Your question should really be which storage solution is right for my app?
The main issue is the location of your files -- they need to be close in proximity to your app, to reduce latency. We've had a problem recently hosting S3 files through a Rackspace app - because S3 is not in RackSpace's datacenter, the latency was high
Because Heroku is built on Amazon's AWS cloud, meaning serving assets from S3 the most efficient & logical method to provide your assets

How can I upload images in bulk (about 100-200 images at a time) in Ruby on Rails?

I want to upload images (around 200 kB each) in bulk. We have multiple options like as CarrierWave, Paperclip, and others. How can I perform these uploads in bulk?
Like other things in computer Sc, the answer is it depends™. What I really mean is
End-Users going to be uploading these?. If yes use jQuery File Upload plugin to present an easy to use interface.
For storage, you can storage it on your server. or even better, upload the images directly from users computers to amazon s3. here is an example of Uploading Files to S3 in Ruby with Paperclip
Obviously you will need to convert this into background job where all images get images with ajax in separate jobs. if you already don't have a fav que system, I would suggest resque or sidekiq
Note: if you choose to upload images directly to S3 via CORS, then your rails server is free need from managing file uploads. And direct uploads are suggested if you have large images or large number of files to be uploaded.
However, direct uploads limit your ability to modify the images(resize etc). so keep that in mind if you are choosing the direct upload solution.
TL;DR
Don't. Rails isn't optimized for bulk uploads, so do it out-of-band whenever you can.
Use FTP/SFTP
The best way to deal with large volumes of files is to use an entirely out-of-band process rather than tying up your Rails process. For example, use FTP, FTPS, SCP, or SFTP to upload your files in bulk. Once the files are on a server, post-process them with a cron job, or use inotify to kick off a rake task.
NB: Make sure you pay attention to file-locking when you use this technique.
Use a Queue
If you insist on doing it via Rails, don't upload hundreds of files. Instead, upload a single archive containing your files to be processed by a background job or queue. There are many alternatives here, including Sidekiq and RabbitMQ among others.
Once the archive is uploaded and the queued job submitted, your queue process can unpack the archive and do whatever else needs to be done. This type of solution scales very well.

What is the best solution for saving images within a rails application?

Initially I wanted to host my application on Heroku, but since the file-system on Heroku is read-only, I would need to store uploaded images on Amazon S3 or something similar.
The pictures mostly have mobile phone camera quality (I think something between 500kb - 1MB). I would like to also create thumbnails of those pictures with Rails and save them.
Since I don't know how much traffic I will have, the whole system should be scalable.
Is there a better/cheaper alternative to the above (Heroku + S3), e.g. storing images in the database or other hosters?
This really depends on whether you want to stay with a PaaS (i.e. Heroku, Azure, etc.), or if you want to go with a IaaS (i.e. AWS). Given that you stated Heroku, I will assume you want a PaaS. I'm not sure of the exact cost difference between services (but I can get this for you if needed), but combining Heroku + S3 + (Paperclip || Carrierwave) = an incredibly fast solution that scales. Then in the future you can look into cutting costs, once you prove your idea.

Uploading files to ec2, first to ebs volume then moving to s3

http://farm8.staticflickr.com/7020/6702134377_cf70482470_z.jpg
OK sorry for the terrible drawing but it seemed a better way to organize my thoughts and convey them. I have been wrestling for a while with how to create an optimal de-coupled easily scale-able system for uploading files to a web app on AWS.
Uploading directly to S3 would work except for the fact the files need to be instantly accessible to the uploader for manipulation then once manipulated they can go to s3 where they will be served to all instances.
I played with the idea of creating a SAN with something like glusterfs then uploading directly to that and serving from that. I have not ruled it out but from varying sources the reliability of this solution might be less than ideal (if anyone has better insight on this I would love to hear). In any case I wanted to formulate a more "out of the box" (in the context of AWS) solution.
So to elaborate on this diagram, I want the file to be uploaded to the local filesystem of the instance it happens to go to, which is an EBS volume. The storage location of the file would not be served to the public (i.e. /tmp/uploads/ ) It could still be accessed by the instance through a readfile() operation in PHP so that the user could see and manipulate it right after uploading. Once the user is finished manipulating the file a message to move it to s3 could be queued in SQS.
My question is then once I save the file "locally" on the instance (which could be any instance due to the load balancer), how can I record which instance it is on (in the DB) so that subsequent requests through PHP to read or move the file will find said file.
If anyone with more experience in this has some insight I would be very grateful. Thanks.
I have a suggestion for a different design that might solve your problem.
Why not always write the file to S3 first? And then copy it to the local EBS file system on whichever node you're on while you're working on it (I'm not quite sure what manipulations you need to do, but I'm hoping it doesn't matter). When you're finished modifying the file, simply write it back to S3 and delete it from the local EBS volume.
In this way, none of the nodes in your cluster need to know which of the others might have the file because the answer is it's always in S3. And by deleting the file locally, you get a fresh version of the file if another node updates it.
Another thing you might consider if it's too expensive to copy the file from S3 every time (it's too big, or you don't like the latency). You could turn on the session affinity in the load balancer (AWS calls it sticky sessions). This can be handled by your own cookie or by the ELB. Now subsequent requests from the same browser come to the same cluster node. Simply check the modified time of the file on the local EBS volume against the S3 copy and replace if it's more recent. Then you get to take advantage of the local EBS file system while the file's being worked on.
Of course there are a bunch of things I don't get about your system. Apologies for that.

Heroku + Paperclip + Amazon S3 - Pricing?

Since Heroku is a read-only filesystem I can't use paperclip to store a small quantity of files on the server. Database image storage is an option, but not particularly ideal since that may crank my client's DB size up from a few hundred KB to over the 5 MB 'free' shared DB limit (depending on size of images).
That leaves Amazon S3 as a likely solution. I understand that Heroku is hosted on EC2 (I believe?). Amazon's pricing wording was a little bit confusing when referring to S3-EC2 file transfers. If I have my client setup an S3 account and let them do file transfers to and from there, what is the pricing going to look like?
Is it cheaper from an S3 point-of-view to to both upload and download data in the rails controllers, and then feed the data to the browser using send_file? Or would it make more sense to just link straight to the image or pdf from the browser like normal?
Would my client have to pay anything at all since heroku is hosted on Amazon? I was looking for other questions related to this but there weren't any really straight answers concerning which parts of the file transfer would be charged for.
I guess the storage would cost a little (hardly anything), but what about the bandwidth? Thanks :)
Is it cheaper from an S3 point-of-view
to to both upload and download data in
the rails controllers, and then feed
the data to the browser using
send_file? Or would it make more sense
to just link straight to the image or
pdf from the browser like normal?
From an S3 standpoint, yes, this would be free, because Heroku would be covering your transfer costs. HOWEVER: Heroku only lets a script run for 30 seconds, and during that time, other clients wont be able to load the site, so this is really a terrible idea. Your best bet is to serve the files out of S3 directly, in which case, yes your customer would be transfer between S3 and the end user.
Any interaction you have with the file from Heroku (i.e. metadata and what not) will be free because it is EC2->S3.
For most cases, your pricing would be identical to what it would be if you were not using heroku. The only case where this would change would be if your app is constantly accessing the data directly on S3 (to read metadata/load files)
You can use Paperclip on Heroku - just not the local file system for storage. Fortunately Paperclip can use s3 for storage. Heroku has a tech article here that covers it.
Also when an asset that's been uploaded is displayed on a page (lookup asset_host) the image would be loaded directly from your s3 buckets URL so you will pay Amazon for a get request to the image and then for data transfer involved but also for storing the assets on s3. Have you looked at the s3 calculator to get indicative costs?

Resources