A place to store file ( Ruby on Rails ) - ruby-on-rails

I'm new to Rails, I wanted to make a website for uploading files: music, videos, pictures, text. What is a better way to store files? I've read about different methods: Database, as a file, Amazon S3?
There will be a lot of files around 1 kb to 20Mb each.
Thanks!

Storing files in a database is not bad, per se. It depends on the kind of database.
Storing files in a relational database is not view as a good practice by the reasons explained by bassneck.
But there are other kind databases that are specifically designed to store any kind of data, in a non-relational way, for example files of any kind. The answer of Dhruva highlight that, MongoDB is pretty good and its support for storing files using GridFS is awesome.
GridFS is very good, for example it can stream only parts of a file, pretty useful for video.
In your specific case – many small files of many kinds of data – GridFS is an real option. I use Heroku & mongohq.com and they work like a charm.

I don't think there is a right answer for that. I use heroku, so my only option is Amazon S3 (which has free quotas for the first year) and I'm pretty satisfied with it. I use carrierwave gem for uploading files and it's really easy to use. I really prefer it over Paperclip.
If your hosting provides a lot of space and bandwidth then you could give it a try.
But for the database, I really don't like the idea of storing files in it.
Updated
Reading a filename from the table should be faster than reading the file itself. The bigger the DB, the longer it will take to make a backup. And if you take heroku for example, you only get 20gb for a shared db. That's not much if you're gonna store 10-20mb files in there. Moving project to another hoster might be easier if you store files in the DB. But if you use an external service (such as S3), there would be no difference for you.

Amazon S3 integration on Heroku is relatively easy to setup and get working with the paperclip gem. Heroku has some documentation on how to get this up and running.
Take a look at their documentation and see if this is the kind of thing your looking for.
http://devcenter.heroku.com/articles/s3

I will suggest you to also check out & consider MongoDB GridFS - http://www.mongodb.org/display/DOCS/GridFS+Specification. My experience with GridFS has been good. I cannot give you a comparative analysis with the other options, however I would like to know the same.

I would recommend you use amazon s3 to store your files. It is the best.

Related

How videos are stored on web server these days?

I'm building a web app that need to store some resources, including but not limited to articles, pictures and videos. My question here is how videos (mp4/ogg) are stored on web server? just as bare file or as binaries in relational or nosql db?
The question to BLOB data almost always comes down to "don't BLOB data". There are very few times that make more sense to write a database connector for your data then to just keep it on disk.
The general trend is to use an established service that employs good design patterns, such as Paperclip for ruby, and tailor it to your needs.
Using an external storage service is also a good idea, for example Amazon S3 will store all of your data for pennies on the dollar per gigabyte, and they'll do an excellent job of it.
If you do decide to cook up your own server that handles data internally, might I recommend digital ocean? I have been very happy with the SSD servers I have setup there (which are super fast).
For video you will almost certainly need a webserver that is capable of streaming the file. I think Nginx has this feature.
I think you need to elaborate a bit about the use case you wish to implement for this app. Only then you can have precise answer.
And to to help out with that, here are some questions you need to ponder:
1- You said you wanted to store videos, what are your requirements beyond storage?
2- do you wish for example to offer access to third party users to these videos and search with keywords?
3- If yes, what kind of information is available about the videos? what is the expected average size of these files?
Many database engines offer the possibility of storing big binary files, but that comes with an impact on performance. That's why most of the storage systems that deal with big files, store the files themselves on the disk and any related metadata (file name, last updated, associated keywords, etc.) are stored in the database. That makes for a scalable system.
I'll edit this answer, if you find it useful and have further related-questions.
An unlimited file storage is difficult to setup without AWS S3. S3 is cheap and scalable solution but expensive to use without proper caching, so we have Nginx S3 proxy that works well: https://stackoverflow.com/a/44749584/290338

What are the performance considerations of using Amazon SimpleDB?

I'm creating a filesystem and I think I'll be storing files in a DB (http://sietch.net/ViewNewsItem.aspx?NewsItemID=124 and http://blog.druva.com/2009/01/25/file-systems-vs-databases/ seem to indicate it's a good idea).
Since it's a filesystem, I'll need A LOT of I/O and REALLY fast. If I'm hosting on EC2, will Amazon SimpleDB be a decent solution for this?
SimpleDB has a maximum record size of a 1,000 BYTES so it is VERY poorly suited to storing files/blobs (unless they are tiny).
It is fairly common for people to use SimpleDB to index files and then to store the files in S3 which is much better suited for storing large objects.
SimpleDB (and recent DynamoDB) aren't suited to store files. Why don't you just make the versioning control on one of them, indexing files stored on S3?
You don't need to override the files on S3, as you can name them whatever you want and get the original name (and other info) from the database. You can even, for text files, for example, have a preview or the begining of the file on the DB or, for images, have a thumbnail on S3 and also get this info from the DB, so when users list the files they get only the thumbnail and, if they want, they download the full-size file.
Take a look at http://aws.amazon.com/en/dynamodb/faqs/#When_should_I_use_Amazon_DynamoDB_vs_Amazon_S3 and http://aws.amazon.com/en/running_databases/

How should I (intelligently) store and archive large xml files for a data import

We've got a rails app that processes large amounts of xml data imports. Right now we're storing these ~5MB xml docs in Postgres. This is not ideal given that we use each xml doc once or twice for parsing. We'd like to have an intelligent way of storing and archiving these docs, but not overly complicate the retrieval process for the sake of space. We've considered moving the docs to Mongo (which we're also using), but then aren't we just artificially boosting the memory requirements of our Mongo db servers?
What's the best way for us to deal with this?
I would just store a link to the file in the DB if you use it only for parsing once or twice and then load the file from the given link. Another aproach is to use a XML DB, e.g. eXist.
You could try eXist, an XML database. If you are just archiving them, though, why not just store them in a directory tree?
You may want to look into DB2's PureXML capabilities. To play with it, you can download the free DB2 Express-C version here. For the record, IBM is also the only database provider officially supporting their Ruby driver and Rails adapter, so you wouldn't be on your own.
What harm are they doing where they are? They will take up 'space' wherever you put them.
If are confident you will never need them again then there is a case for archival to less expensive storage (eg tape?) - otherwise whatever you do will 'overly complicate the retrieval process'
You could consider compressing them in-place if you are not already doing so

Carrierwave or Dragonfly

I have been looking into rails file upload tools and the ones that seemed the most appealing and interesting to me were carrierwave and dragonfly.
From looking around it seems like carrierwave takes the more traditional style where you can process the file on save whereas dragonfly is middleware so it allows you to process on the fly.
I was wondering if people had any references to performance test or any test that compare the two.
Also, just curious on what people's opinions are about both and which they prefer and of course why they prefer it.
Depending on the setup. As Senthil writes, as long as you have a cache-proxy in front, it's fine with Dragonfly.
But if you are using the built-in rails caching, Carrierwave will perform better, as the files can be loaded without any processing. If you don't do any processing, it doesn't matter.
Here's how I summarized when considering both for Images on a project with Mongomapper:
Carrierwave:
Pros
Generates thumbs on upload (saves CPU time)
Can use files directly from a static/cached document
Doesn't need any cache-front
Supports various storage backends (S3, Cloudfiles, GridFS, normal files) easy to extend to new storage types if needed.
Cons
Generates thumbs on upload (diffucult to generate new thumbsizes)
Doesn't natively support mongomapper
Uses storagespace for every file/thumb generated. If you use normal file storage, you might run out of inodes!
Dragonfly:
Pros
Should work with mongomapper, as it only extends ActiveModel
Generates thumbs on the fly (easier to create new layouts/thumbsizes)
Only one file stored! Saves space :)
Cons
Eats CPU on every request if you don't have a cache-proxy, rack::cache or similar.
No way to access thumbs as files if needed.
I ended up using both in the end.
A future wish is for carrierwave to suppert MongoMapper again. After using both in various situations, I've found that the features in MongoMapper (rails3 branch) always works, and are easy to extend using plugins. Cannot say the same for Mongoid as of now, but that might change.
I use dragonfly simply because carrierwave dropped support for mongomapper and paperclip doesn't work mongomapper without some hacks.
Dragonfly does processing on the fly, i.e.
is meant to be used behind a
cache-proxy such as Varnish, Squid or
Rack::Cache, so that while the first
request may take some time, subsequent
requests should be super-quick!
Paperclip
Paperclip is intended as an easy file attachment library for Active Record. The intent behind it was to keep setup as easy as possible and to treat files as much like other attributes as possible. This means they aren't saved to their final locations on disk, nor are they deleted if set to nil, until ActiveRecord::Base#save is called. It manages validations based on size and presence, if required. It can transform its assigned image into thumbnails if needed, and the prerequisites are as simple as installing ImageMagick (which, for most modern Unix-based systems, is as easy as installing the right packages). Attached files are saved to the filesystem and referenced in the browser by an easily understandable specification, which has sensible and useful defaults.
Advantages
validations, Paperclip introduces several validators to validate your attachment:
AttachmentContentTypeValidator
AttachmentPresenceValidator
AttachmentSizeValidator
Deleting an Attachment
Set the attribute to nil and save.
#user.avatar = nil #user.save
Paperclip is better for an organic Rails environment using activerecord and not all the other alternatives. Paperclip is much easier to handle for beginning rails developers and it also has advanced capabilities for the advanced developer.
A huge fan of Paperclip because it doesn't require RMagick, it's very easy to set it up to post through to Amazon S3 and declaring everything in the models (validations, etc) keeps things clean.
With respect to multiple file uploads and progress feedback, both are possible with both Paperclip and Attachment_fu, but both typically require some elbow grease with iframes and Apache to get working.
CarrierWave
This gem provides a simple and extremely flexible way to upload files from Ruby applications. It works well with Rack based web applications, such as Ruby on Rails.
Advantages
Simple Model entity integration. Adding a single string image attribute for referencing the uploaded image.
"Magic" model methods for uploading and remotely fetching images.
HTML file upload integration using a standard file tag and another hidden tag for maintaining the already uploaded "cached" version.
Straight-forward interface for creating derived image versions with different dimensions and formats. Image processing tools are nicely hidden behind the scenes.
Model methods for getting the public URLs of the images and their resized versions for HTML embedding.
if built-in rails caching, Carrierwave will perform better, as the files can be loaded without any processing. If you don't do any processing, it doesn't matter.
Generates thumbs on upload (saves CPU time)
Can use files directly from a static/cached document
Doesn't need any cache-front
Supports various storage backends (S3, Cloudfiles, GridFS, normal files) easy to extend to new storage types if needed.
One of the fact that it doesn't clutter your models with configuration. You can define uploader classes instead. It allows you to easily reuse, extend etc your upload configuration.
What we liked most is the fact the CarrierWave is very modular. You can easily switch your storage engine between a local file system, Cloud-based AWS S3, and more. You can switch the image processing module between RMagick, MiniMagick and other tools. You can also use local file system in your dev env and switch to S3 storage in the production system. Carrierwave has good support for exterior things such as DataMapper, Mongoid, Sequel and even can be used with a 3rd party image managment such as cloudinary The solution seems most complete with support coverage for about anything, but the solution is also much messier (for me at least) since there is a lot more code that you need to handle. Need to appreciate the modular approach that CarrierWave takes. It’s agnostic as to which of the popular S3 clients you use, supporting both aws/s3 and right_aws. It’s also ORM agnostic and not tightly coupled to Active Record. The tight coupling of Paperclip has caused us some grief at work.
Disadvantages
You can't validate file size. There is a wiki article that explains how to do it, but it does not work.
Integrity validations do not work when using MiniMagick (very convenient if you are concerned about RAM usage). You can upload a corrupted image file and CarrierWave will throw an error at first, but the next time will swallow it.
You can't delete the original file. You can instead resize it, compress, etc. There is a wiki article explaining how to do it, but again it does not work.
It depends on external libraries such as RMagick or MiniMagick. Paperclip works directly with the convert command line (ImageMagick). So, if you have problems with Minimagick (I had), you will lose hours diving in Google searches. Both RMagick and Minimagick are abandoned at the time of this writing (I contacted the author of Minimagic, no response).
It needs some configuration files. This is seen as an advantage, but I don't like having single configuration files around my project just for one gem. Configuration in the model seems more natural to me. This is a matter of personal taste anyway.
If you find some bug and report it, the dev team is really absent and busy. They will tell you to fix bugs yourself. It seems like a personal project that is improved in spare time. For me it's not valid for a professional project with deadlines.
Doesn't natively support mongomapper
Uses storagespace for every file/thumb generated. If you use normal file storage, you might run out of inodes!
Dragonfly
The impressive thing about Dragonfly, the thing that separates it from most other image processing plugins, is that it allows on-the-fly resizing from the view.
Not needing to configure thumbnail sizing or other actions in a separate file is a huge time and frustration saver. It makes Rails view code like image_tag #product.image.thumb('150x150#') possible.
The magic is all made possible by caching. Instead of building the processed version on upload and then linking to individual versions of the image, the plugin generates images as they are requested. While this is a problem for the first load, the newly created image is http cached for all subsequent loads, by default using Rack::Cache, though other more robust solutions are available should scaling become an issue.
Advantages
Will I be changing image size often? Example: if you want to let your users change the size of their pictures (or your need flexibility in size for some other reason), or really fast development.
Yes: Dragonfly
No: either Carrierwave or Paperclip
Can be used with mongomapper with no trouble
Performance should be fine as long as you use a caching proxy
Should work with mongomapper (it only extends ActiveModel)
Generates thumbs on the fly (easier to create new layouts/thumbsizes)
Only one file stored! Saves space
Processing done on the fly (is meant to be used behind a cache-proxy such as Varnish, Squid or Rack::Cache, so that while the first request may take some time, subsequent requests should be super-quick)
Disadvantages
Eats CPU on every request if you don't have a cache-proxy, rack::cache or similar.
No way to access thumbs as files if needed.
References
Ruby on Rails Image Uploads with CarrierWave and Cloudinary
Rails 3 paperclip vs carrierwave vs dragonfly vs attachment_fu
Other people wrote pretty good summaries, I just would like to say that from our experience Dragonfly setup needed more maintenance, and because of negligence of some developer(s) along the way we were also stuck with plenty of orphan images which lingered after the original was removed. This wouldn't have happened with a vanilla carrierwave.
P.S. We migrated to cloudinary (and use carrierwave with it) and are happy with it.

RoR - Images in DB tables?

I want a user to upload multiple images (+ thumbs) and give a description about their pics.
What do i need to do to create this the ruby way?
Do i manually create the tables (and which are these) or what gem do i require?
I want to store the file physical on a path and store the link (+ attr. information) in the db (if it is the best solution).
I am open to any alternatives to seek my best solution! :-)
Look at paperclip. Other great solution for handling multiple images for an item is paperclippolymorph
I'm not sure if there is a "best" solution, whether or not you store the images in the database is a tradeoff. Storing images on the server's filesystem and keeping the file's path information in the database will keep your DB smaller, but it will also add one more folder/location that you need to keep backed up and can provide security problems (if the image storage folder is not properly secured, it can be easier for an attacker to pull images off of a filesystem than extract them out of a database).

Resources