Rails: Storing binary files in database [closed] - ruby-on-rails

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).

This is a pretty standard design question, and there isn't really a "one true answer".
The rule of thumb I typically follow is "data goes in databases, files go in files".
Some of the considerations to keep in mind:
If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.
Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?
Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.
On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.
Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.
On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.
Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?
Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.
Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.
In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.

There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.
But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.
The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.

I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used.
But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done.
And adding a web server is no issue at all.

Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.

If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!

Unable to find an up-to-date answer to this question I have implemented an
database service for Active Storage available since Rails 5.2 that works just like any other Active Storage service, but stores file content in a special database column instead of a cloud service.
The implementation is based on a standard Rails Active Storage service, adding a migration with a new model: an extra table that stores blob contents in a binary field. The service creates and destroys records in this table as requested by Active Storage.
Therefore, this service, once installed, can be consumed via a standard Rails Active Storage API.
https://github.com/TitovDigital/activestorage-database-service
Please be aware of all pros and cons of using a database for storing files.
With the right database it will provide full ACID support and can wrap file storage and deletion into transactions. It is also much easier in DevOps as there is one less service to configure.
Large files or large traffic are the risky cases. Either will put an unnecessary strain on the app and database servers.

Related

How to store "large" amounts of data in a desktop app?

Update
On a completely unrelated search, I have found this: Lightweight SQL database which doesn't require installation which lists a few possible options with about the same goals.
Original post
We have a desktop .Net/WPF app, with large (for a desktop app) amounts of data stored: it's has layouts, templates, products list and technical specs, and many more stuff.
Today it's stored in an Access DB, but we have hit the Access limitations pretty hard: it's not very fast, the DB weights 44Mb (which results in a large setup package), and more importantly, it's a pain to use with version control, because we can't merge the data from a branch to another. For instance, we might create a branch to add a few products, but then we have to add them manually in the trunk when we merge. We could use SQL scripts, but writing advanced SQL scripts for Access is a pain.
Basically, I want to replace the MS Access DB with another storage format, because Access is not well adapted.
I had tought of using JSON files that would be unzipped during or after install, but I'm a bit afraid of performance problems.
I'm also thinking of splitting the data into multiple files with multiple formats, depending on its usage, but using different formats might get complicated or annoying to develop.
Performance
Some parts of the DB are accessed pretty often and should be performance-optimized, whereas others are accessed maybe 1 or 2 times per work session, and using a poor-performance but high-compression format could be OK.
Size
We want the installer to be the smallest possible, so the library should be small, and the format should use small files. Using a library that adds 5 Mb to the installer file is out of the question.
Compatibility
The software must be able to run on .Net 4 (not 4.5), and it would be great if it ran on Windows XP (even though we're thinking more and more of just abandoning it going forward, it's still more than 7% of our market share).
Moreover, it should not need to install a server (like MS Access or SQLite) because it will be installed on end-user's computers, and we don't want to bloat them.
Versionning
It should be easy to version the data and the DB structure. The file should either be a text file (like JSON), or scripts should be easy to run in the continuous integration platform (like SQL server).
So, which technology would you use that answers all these contraints ?
Thanks !
As for your version control pains, it sounds from your description that if I were you, I'd keep the raw data in text files that are version-controlled, and only have the build process produce the database from them. This way, you should be able to use SQLite.
I would go for SQLite in your case, since the files are self-contained and easy to locate (hence easy to save on a version control system), installer is small, and performance is good.
http://www.sqlite.org/

How videos are stored on web server these days?

I'm building a web app that need to store some resources, including but not limited to articles, pictures and videos. My question here is how videos (mp4/ogg) are stored on web server? just as bare file or as binaries in relational or nosql db?
The question to BLOB data almost always comes down to "don't BLOB data". There are very few times that make more sense to write a database connector for your data then to just keep it on disk.
The general trend is to use an established service that employs good design patterns, such as Paperclip for ruby, and tailor it to your needs.
Using an external storage service is also a good idea, for example Amazon S3 will store all of your data for pennies on the dollar per gigabyte, and they'll do an excellent job of it.
If you do decide to cook up your own server that handles data internally, might I recommend digital ocean? I have been very happy with the SSD servers I have setup there (which are super fast).
For video you will almost certainly need a webserver that is capable of streaming the file. I think Nginx has this feature.
I think you need to elaborate a bit about the use case you wish to implement for this app. Only then you can have precise answer.
And to to help out with that, here are some questions you need to ponder:
1- You said you wanted to store videos, what are your requirements beyond storage?
2- do you wish for example to offer access to third party users to these videos and search with keywords?
3- If yes, what kind of information is available about the videos? what is the expected average size of these files?
Many database engines offer the possibility of storing big binary files, but that comes with an impact on performance. That's why most of the storage systems that deal with big files, store the files themselves on the disk and any related metadata (file name, last updated, associated keywords, etc.) are stored in the database. That makes for a scalable system.
I'll edit this answer, if you find it useful and have further related-questions.
An unlimited file storage is difficult to setup without AWS S3. S3 is cheap and scalable solution but expensive to use without proper caching, so we have Nginx S3 proxy that works well: https://stackoverflow.com/a/44749584/290338

Rails: Caching a Tree in memory on the server

I have a postgresql database which contains multidimensional data. What I did was I wrote a data structure that sorts all database rows into a tree format. Now the database is large and so I dont want to generate the tree every time a request comes in from a browser. What Id like to do is construct the tree once in a certain time period and persist it in memory on the server.
The tree is read only by the way. So now each time a request comes in the tree need not be generated new, its already there.
How can I make this happen. Im not an expert programmer, just a beginner and definitely new to web programming. So some of these concepts are new to me.
But if you could please point me in the right direction in terms of the concepts involved here, I can google the rest.
Or if you have actual links or examples that would be fantastic.
Thanks
There are several ways to approach this problem. It depends on just how close to the application you want the variables. If you're really looking to have them right "on top" of the application, for fastest possible use, then you could look at using a global variable "$tree" and hooking in to the application flow. Other options might include memcached, which is still pretty darn close to the application. Redis would be a good option for an in-memory database that could be shared between instances of an application, as it is a NoSQL database that you query. Not quite as close to the application though.
Generally, those are your primary options. In-application variables that survive requests. Application frameworks that will help variables survive requests and provide you a querying mechanism. Or, an In-Memory databases that will allow you to store and query rapidly from multiple instances. Each is a viable option, though I'm pretty sure you'd get a lot of 'community' flack for using a straight up global variable (such practices are considered unclean for their lack of thread-safety and other such concerns).

What design considerations should one take to receive text and multiple attachments via web?

I am developing a web application to accept a bunch of text and attachments (1 or more) via email, web and other methods.
I am planning to build a single interface, mostly a web service to accept this content.
What design considerations should I make?
I am building the app using ASP.NET MVC 2.
Should the attachments be saved to disk or in the database?
Should the unified single interface be a web service?
Pros and cons to using web services to upload files
as with any acceptance of files i'd be checking them for viruses or the like. i'm very nervous about files transmitted from the internet.
i always like putting my files in a database because it's neater i find. i hate having files over the network with folders needing rights etc. i know there are people that prefer it the other way so i guess my answer is also depends on personal preference.
i like the db approach because i can more easily tie files to records and do searches. if you have a file system then you still need to store info about the file plus the extra work of storing it.
then if you need to move files around you also need to possibly modify references in the database.
then again, you need to allocate enough space to grow the database and then cater for multiple databases perhaps as storage runs out.
so i guess if you're downloading large files then yeah maybe i can see the point of a file system as it's easier to grow it. if you have small text files then maybe a database will work.

How to protect file system?

I want to store large files (videos, sounds) and access them via Database. I am balancing now between filesystem (references to files would be stored in DB) and pure DB (which could be enormously large after time).
I have to protect the content too, so I thought, that DB solution suits better for this purpose. (probably it is not a good idea).
On the other hand I have got hint to encrypt files to protect them, if I choose to use file system.
How should I do this?
P.S
please see the question What database should I choose not to worry about size limit?
P.P.S
Under protection I mean encrypt videofile/soundfile using a crypt algorithm. When the application need to read them, it have to decrypt files...
In that way the stolen files are useless unless appropriate decrypt algorithm is present.
I thought to use RAR secured with password. As far as I know it is very hard to break it, when password is long enough. (Maybe I am wrong).
I am not familiar about MD5....
I can not protect files against theft, but I want to prohibit to read it freely.
One approach would be to create a background process with an elevated security token that would it access a section of the filesystem only available to administrators and that process. On Windows you'd create a "service". They call it "daemon" on *nix, I believe.
That service could then expose an API via pipes, sockets, or a shared memory region where the unelevated, user-mode database tool could get and set files.
There's no way to completely prevent system administrators from accessing a file directly, so if that is a requirement, you're out of luck. On Windows administrators have a special privilege that allows them to take ownership of any securable item such as a file or directory. Once they're owners, they can do anything they want to the securable item. There's just no way around that.
I have implemented both approaches in different projects with different requirements and constraints. And I would strongly recommend to keep all the contents in the database, storing the media files in large blobs. Eventhough that will require very large tables, that should not be a problem for the latest versions of the most well known databases.
I recommend DB2. DB2 since version 9 supports very large tables. The maximum is monstrously large. 512000 petabytes, half a zettabyte.
You need to accept that the choice between storing the files in the database vs. in the file system ultimately doesn't matter much - in both scenarios they can be read trivially from outside, unless there is some encryption. That moves the problem from where to store the data to how to store the secret key to decrypt encrypted data, in your application.
This is a hard problem. There's probably nothing you can develop that can't be cracked by a determined attacker in a rather short time. It depends on the audience of your program whether that's a real concern; if it is, then you can't do much. It takes a single successful crack to access your data and make it accessible to all interested in it. The attacker will go for the weakest part, which isn't the hard-to-break file encryption, but your application.
Use a DB (firebird or -embedded) and keep large content out of it.
If it is encryption you are worrying about, do this at the filesystem level instead of the db level.
All modern OSes have support for encrypted filesystems
There are merits in both approaches.
If you want to have the files separately in the filesystem, I would encrypt each one individually. A password-protected RAR or ZIP file would be as good a method as any.
Use a different password/decryption key for each file, and store the password in the database along with each file name.
The following is suggested without knowing the specifics of your application.
As you mention that you can not protect against theft, just about the only option to make sure the multimedia files are safe (unusable) by anyone other than the "owner" you need to encrypt them using a cipher like AES or BlowFish and a secure secret Key. These algorithms are different from MD5 that you mention. MD5 is a HASH algorithm.
For Delphi a rather good encryption library is DCPcrypt forund at http://www.cityinthesky.co.uk/cryptography.html. If has both HASH and Cipher algorithms.
Your problem will be Cipher KEY Management, namely "What password to use for encryption?". The simplest solution without thinking about this would be to use the users own password, untill you realize that the user may change the password. If the user does that you need to Decrypt and ReEncrypt every single multimedia file associated with the user.
To answer the actual question about key management I'd suggest reading up on key management. As I'm no expert on cryptography I hope someone more versed in the crypto world can help here... Link: http://en.wikipedia.org/wiki/Key_management.
I think it all depends on a few things:
Where do the files come from initially
How will the files be accessed (over the network or locally)
Who do you want to protect the files from
If the initial files come from "you" (the developer) then encrypted database blobs would probably be the best way as most dbs' come with some form of encryption.
If the files come from the user of the software, then using the file system would suffice - possibly using an temporary directory to store and retrieve files if used over the network, but actually storing the files in a non-shared location so that network users don't have access to all the files.
Just a few thoughts.
Local file protection, web project protection, and secure email, Record-level security in databases (DB2, Oracle, Firebird, MS SQ:, My SLQ etc.). Creating encryption systems from primitives etc. Look for File Protect System - SE or SPE or PE.

Resources