How videos are stored on web server these days? - ruby-on-rails

I'm building a web app that need to store some resources, including but not limited to articles, pictures and videos. My question here is how videos (mp4/ogg) are stored on web server? just as bare file or as binaries in relational or nosql db?

The question to BLOB data almost always comes down to "don't BLOB data". There are very few times that make more sense to write a database connector for your data then to just keep it on disk.
The general trend is to use an established service that employs good design patterns, such as Paperclip for ruby, and tailor it to your needs.
Using an external storage service is also a good idea, for example Amazon S3 will store all of your data for pennies on the dollar per gigabyte, and they'll do an excellent job of it.
If you do decide to cook up your own server that handles data internally, might I recommend digital ocean? I have been very happy with the SSD servers I have setup there (which are super fast).
For video you will almost certainly need a webserver that is capable of streaming the file. I think Nginx has this feature.

I think you need to elaborate a bit about the use case you wish to implement for this app. Only then you can have precise answer.
And to to help out with that, here are some questions you need to ponder:
1- You said you wanted to store videos, what are your requirements beyond storage?
2- do you wish for example to offer access to third party users to these videos and search with keywords?
3- If yes, what kind of information is available about the videos? what is the expected average size of these files?
Many database engines offer the possibility of storing big binary files, but that comes with an impact on performance. That's why most of the storage systems that deal with big files, store the files themselves on the disk and any related metadata (file name, last updated, associated keywords, etc.) are stored in the database. That makes for a scalable system.
I'll edit this answer, if you find it useful and have further related-questions.

An unlimited file storage is difficult to setup without AWS S3. S3 is cheap and scalable solution but expensive to use without proper caching, so we have Nginx S3 proxy that works well: https://stackoverflow.com/a/44749584/290338

Related

A place to store file ( Ruby on Rails )

I'm new to Rails, I wanted to make a website for uploading files: music, videos, pictures, text. What is a better way to store files? I've read about different methods: Database, as a file, Amazon S3?
There will be a lot of files around 1 kb to 20Mb each.
Thanks!
Storing files in a database is not bad, per se. It depends on the kind of database.
Storing files in a relational database is not view as a good practice by the reasons explained by bassneck.
But there are other kind databases that are specifically designed to store any kind of data, in a non-relational way, for example files of any kind. The answer of Dhruva highlight that, MongoDB is pretty good and its support for storing files using GridFS is awesome.
GridFS is very good, for example it can stream only parts of a file, pretty useful for video.
In your specific case – many small files of many kinds of data – GridFS is an real option. I use Heroku & mongohq.com and they work like a charm.
I don't think there is a right answer for that. I use heroku, so my only option is Amazon S3 (which has free quotas for the first year) and I'm pretty satisfied with it. I use carrierwave gem for uploading files and it's really easy to use. I really prefer it over Paperclip.
If your hosting provides a lot of space and bandwidth then you could give it a try.
But for the database, I really don't like the idea of storing files in it.
Updated
Reading a filename from the table should be faster than reading the file itself. The bigger the DB, the longer it will take to make a backup. And if you take heroku for example, you only get 20gb for a shared db. That's not much if you're gonna store 10-20mb files in there. Moving project to another hoster might be easier if you store files in the DB. But if you use an external service (such as S3), there would be no difference for you.
Amazon S3 integration on Heroku is relatively easy to setup and get working with the paperclip gem. Heroku has some documentation on how to get this up and running.
Take a look at their documentation and see if this is the kind of thing your looking for.
http://devcenter.heroku.com/articles/s3
I will suggest you to also check out & consider MongoDB GridFS - http://www.mongodb.org/display/DOCS/GridFS+Specification. My experience with GridFS has been good. I cannot give you a comparative analysis with the other options, however I would like to know the same.
I would recommend you use amazon s3 to store your files. It is the best.

Proper Amazon AWS S3 usage

Heres a quick rundown. We have two apps. Both apps are different from each other in every aspect.
Our first app is a company profile (4 page layout: home, products, about, contact). I don't think it will generate the same traffic as social web sites online. It allows staff members to post product photos. Is it better to have an AWS S3 account to store this content on? Or would I be better off storing the files on the local web server?
Our second app is more social oriented. Here, we have decided to use an S3 bucket for this app. Since we already have an AWS account, should we create a bucket for the first app on this AWS account? Or will this just render more expenses in the long run? What are your thoughts?
On another note. Should app related content (logo, icons, background images, buttons, etc) be stored on the S3 account or on the local server? What is the general consensus on this?
I would keep anything that is "hard-coded" into your site, IE the images in your CSS and anything in your HTML/ERB/HAML views that are referenced by their file name as part of the web server itself. You don't have to do this, but one reason to do it is if there's any interruption of Amazon S3 or any kind of configuration issue etc at any point in the future these 'basic' visual resources won't be affected, so your site will still look pretty much intact.
Plus, this kind of content has a specific appropriate place to live (your public folder) and rails helpers that make it really easy to access.
I just can't see a reason to mess with S3 if it's not going to be high volume of data and bandwidth.
On the other hand, if it IS going to be very high bandwidth and you're using a cheap hosting platform that doesn't give you tons of bandwidth, moving those images to S3 would save you some. But are you really in danger of exceeding your bandwidth limits on the server?
Lastly, I think your OTHER app is using S3 as expected, that's a good use of S3. I would not necessarily recommend mixing buckets with other apps, though. The simplest reason for this is, let's say one of your apps grows and at some point merges with, partners with, or is sold to another party. Really, imagine any scenario where another interest becomes involved in either app.
Now you've got no way to maintain separation of access between the resources your two different apps are using. So you have to either migrate to a new platform for one of them, or you have to deal with them being 'joined at the hip' so to speak.
In other words, I just don't see an upside to sharing AWS resources between two apps that aren't directly related to each other. No reason to make them a package deal if they aren't naturally a package deal.
On the other hand if it DOES make sense for them to be totally linked, then it doesn't really matter either way.
Hope that helps you think about it. Good luck!
s3 is ridiculously cheap so I'd put any image uploads on there.
In the past I've put static images, stylesheets and javascripts and so forth on s3 but finally I just leave them on the web server. Reason for this is jammit which is oh sooo cool in handling packaging, updating, compressing, etc, etc of assets.

Document management system: what to use as storage backend (docs content repository)?

I want to make a document management system (interface in Ruby).
What do profesional sollutions (Alfresco, Liferay social office, others) use for storing and versioning documents?
What else can I use?
Key points:
storage space optimization (deltas, compression ...)
versioning
ability to index docs (can be external)
ability to make backups at runtime (live hot-backup)
locking?
scalability on large data volume
ensure data integrity (hashing?)
permissions
transactional
Workflow support (optional)
Bonus points:
how does KnowledgeTree do it?
how does Liferay Social Office do it (jcr?) ?
how does Alfresco do it ?
Any books on this issue ?
Most of the enterprise document management solutions I've seen (Cimage, Documentum, LiveLink) definitely don't care about #1. Storage is relatively cheap, especially if it's storage vs processing (store and retreieve). They mostly rely on filesystem based storage - perhaps with name abstraction such that ShoppingList.doc perhaps becomes 20100909100101a.doc.rev1, with a database tracking the given-name, the stored name, revisions, and various other data {MIME type, headers & properties etc}. By not generating deltas + compression you get indexing very easily from any number of existing products/agorithms. Versioning is also extremely simple with this approach.
Depending on the size and scale you're building, you could also store versioned files within a database.
An (S)FTP or CIFS storage process would also allow your software to run on an app server with modest space, but store the files+history on a file or cloud server of some sort - although this isn't much different from filesystem based storage.
Y'know, my first instinct would be to just use Subversion. You'll need external indexing, and if you want to store hashes you'll need to write some code to do it yourself, but everything else fits.
It ships with ruby bindings, though I'm not personally familiar with their quality.
alfresco is commonly used as the backend, it has a nice REST API. You can also define your own integration API if you don't like the included one.

How do I allow safely and inexpensively allow images on my site?

I have developed a social networking site for gardeners website, and am interested in giving users the ability to add images to their "tweets".
If I allow them to upload images to the actual site, it seems like this will quickly become expensive (this is a side project, not funded by anyone than myself and my own obsessions). Let's say the site becomes moderately popular, with 100K users posting one image a week, of only 250K in size. That's (100000 * .1 * 52 / 1024) = 508 MB/year in storage (and that doesn't take into account increased bandwidth). Plus I'd have to increase the server load to scale the images. I'm not sure if I should just go ahead with this, or if there are better possibilities.
Linking to other sites seems better in some ways. You do have broken links, but a larger concern for me is security: XSS.
The application is on Rails 3, using MongoDB / Mongoid as the backend, if that matters.
I'm looking for solutions such as:
APIs that store images on external sites. What would be ideal is the ability to upload it to my site, and make an API call to store it on an external site.
APIs (perhaps Javascript APIs) that make it easy to link to one or more external image hosting sites securely.
Markdown or similar markup that allow linking to external images securely. I am interested in giving users the ability to format their posts in limited ways, so this might solve two problems at the same time. I notice that this is what Stack Overflow does.
Security libraries that whitelist image URL patterns
Advice on why I am thinking about this problem wrong. For example, maybe I should just store the images. 500MB a year is really not all that expensive, and it does allow me to create a very clean user experience.
My objectives are (in order):
- Secure, both for my own site, and to not allow XSS attacks against other sites
- Best possible user experience
- Easy to maintain and implement
What have you done to allow user-supplied images on your site?
You're thinking about the problem wrong ;) or rather not at the right time.
Don't worry about the bandwidth now, when you don't have that many users yet. Concentrate on making the site user friendly and popular first. Performance, bandwidth, disk space - these are the things you'll work on when they become problems. By the time you've 100k users the cost of buying that space and bandwidth on, say, Amazon S3 may not be an issue anymore.
Why not using a service like Amazon s3? Is cheap, very cheap (With the Reduced Redundancy Storage), and the most important plugins like Paperclip support it out of the box...
You will need to look at the T&C of picture hosts (flickr etc...) and see if your usage is applicable. Flickr has an API, not sure about the others just search for HOST api.
Flickrs api is at:
http://www.flickr.com/services/api/

Rails: Storing binary files in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).
This is a pretty standard design question, and there isn't really a "one true answer".
The rule of thumb I typically follow is "data goes in databases, files go in files".
Some of the considerations to keep in mind:
If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.
Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?
Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.
On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.
Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.
On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.
Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?
Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.
Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.
In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.
There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.
But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.
The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.
I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used.
But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done.
And adding a web server is no issue at all.
Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.
If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!
Unable to find an up-to-date answer to this question I have implemented an
database service for Active Storage available since Rails 5.2 that works just like any other Active Storage service, but stores file content in a special database column instead of a cloud service.
The implementation is based on a standard Rails Active Storage service, adding a migration with a new model: an extra table that stores blob contents in a binary field. The service creates and destroys records in this table as requested by Active Storage.
Therefore, this service, once installed, can be consumed via a standard Rails Active Storage API.
https://github.com/TitovDigital/activestorage-database-service
Please be aware of all pros and cons of using a database for storing files.
With the right database it will provide full ACID support and can wrap file storage and deletion into transactions. It is also much easier in DevOps as there is one less service to configure.
Large files or large traffic are the risky cases. Either will put an unnecessary strain on the app and database servers.

Resources