Rails: what security issues come with extracting text from user-submitted files? - ruby-on-rails

If users of an app are able to submit flat text files, and these files have data pulled from them by a program using a regex (which is then returned to the user), how can this be abused?
I know there are concerns with executable files or unsanitized filenames when they're being saved, but I don't know what the risks are with just opening and parsing a file that lasts temporarily in memory.
Thanks.

It depends very much on the implementation of this theoretical system. The big two vulnerabilities are:
SQL Injection. If you are committing this data to a database and do so in an improper manner, you could expose your database to whatever maliciously-formatted data the user uploads.
Cross-Site Scripting. If you're rendering the results of the upload as HTML, you potentially allow an XSS vulnerability if the results aren't properly escaped.
Proper handling of user input can reduce these problems. Generally though much depends on the actual implementation details of your code. If you're evaling user input, obviously, that's also an enormous security flaw... but it's not something we can see at this level of detail.

Related

Best markup format for future-proofing large text chunks?

I have a number of records (=< 100) that contain sizeable chunks of text that require marking up (semantically: lists, headings, tables, links, quotations, etc...) before storing in a re-usable file format.
When stored, it is likely to remain more or less unchanged for as many years into the future as possible.
It contains some non-ascii, so UTF-8 is required. I started using HTML, then considered Markdown... but would like to know what people think is the most future-proof markup format for long-term storage? The content is initially for a (mostly static) website, but may be used as content for other outputs.
Finally, opinions on the choice of storage for long-term use - database, separate documents...? Changes to records will be infrequent and edited by only 1-3 people, and read access should increase over time.
Update:
I've finally chosen the common features (e.g. for tables) between MultiMarkdown, PHP Markdown Extra and Kramdown as the text format (Markdown omits too many HTML tags), and am converting the resulting files to html with Kramdown. Now I'm trying out iOS Markdown editors that can handle an extended Markdown and sync via Dropbox to my desk/laptop.
Any storage not designed for long-term archiving will break.
It is not so much a question of database vs. filesystem, but how to ensure that no (silent) data corruption happens, and how to migrate data. I can give you no definitive answers, because it depends on a lot of factors (incl. costs), but here are a few resources:
Building Better Long-Term Archival Storage System, Talk by Miller/Storer at the Library of Congress
The Digital Dilemma, Book aimed at movie archiving, but highlights some of the issues in long term archiving.
Project Honeycomb, a project by SUN for open source long-term archiving, but discontinued.
I have no real answer for the format question, but I think HTML + UTF-8 should be readable even in decades, but document it.

Get size of folder and all contents for each user

I have seen in several SAAS sites where people store data and the site tracks how much data is on their servers. They display that information to the user.
Using ruby on rails how do some of these sites do this? It seems like enough sites do it that there must be a standard way I am just not aware of. Or is it everyone pretty much implements their own solution?
If everyone has implemented their own solution then is this a good appraoch using:
`du -s <directory>`
and just parsing the data.
All depends on how the site is storing the actual data, and meta-data about the objects that its users upload to it. It is entirely possible that the Data isn't being stored on a traditional file system (such as S3). So in cases that like du wouldn't work.
So if you store the meta data in the database, including the size, you can just get the sum of the upload size via a query, and not have to have hit the underlying filesystem...
du is a very good tool for this sort of thing, it's probably faster than what you could roll by hand using Find, but you'll need to be careful when parsing the output.
It's not uncommon for directories to contain exotic characters in their names which includes the obvious like spaces, and the unusual like newlines. This makes parsing the output of du somewhat unreliable if someone does this:
% mkdir "foo
1234 bar"
If that's not a big deal, forget about it. Otherwise you'll need to recurse and do the math manually, something that can take a while for the Ruby interpreter on large filesystems.

How should I (intelligently) store and archive large xml files for a data import

We've got a rails app that processes large amounts of xml data imports. Right now we're storing these ~5MB xml docs in Postgres. This is not ideal given that we use each xml doc once or twice for parsing. We'd like to have an intelligent way of storing and archiving these docs, but not overly complicate the retrieval process for the sake of space. We've considered moving the docs to Mongo (which we're also using), but then aren't we just artificially boosting the memory requirements of our Mongo db servers?
What's the best way for us to deal with this?
I would just store a link to the file in the DB if you use it only for parsing once or twice and then load the file from the given link. Another aproach is to use a XML DB, e.g. eXist.
You could try eXist, an XML database. If you are just archiving them, though, why not just store them in a directory tree?
You may want to look into DB2's PureXML capabilities. To play with it, you can download the free DB2 Express-C version here. For the record, IBM is also the only database provider officially supporting their Ruby driver and Rails adapter, so you wouldn't be on your own.
What harm are they doing where they are? They will take up 'space' wherever you put them.
If are confident you will never need them again then there is a case for archival to less expensive storage (eg tape?) - otherwise whatever you do will 'overly complicate the retrieval process'
You could consider compressing them in-place if you are not already doing so

How to protect file system?

I want to store large files (videos, sounds) and access them via Database. I am balancing now between filesystem (references to files would be stored in DB) and pure DB (which could be enormously large after time).
I have to protect the content too, so I thought, that DB solution suits better for this purpose. (probably it is not a good idea).
On the other hand I have got hint to encrypt files to protect them, if I choose to use file system.
How should I do this?
P.S
please see the question What database should I choose not to worry about size limit?
P.P.S
Under protection I mean encrypt videofile/soundfile using a crypt algorithm. When the application need to read them, it have to decrypt files...
In that way the stolen files are useless unless appropriate decrypt algorithm is present.
I thought to use RAR secured with password. As far as I know it is very hard to break it, when password is long enough. (Maybe I am wrong).
I am not familiar about MD5....
I can not protect files against theft, but I want to prohibit to read it freely.
One approach would be to create a background process with an elevated security token that would it access a section of the filesystem only available to administrators and that process. On Windows you'd create a "service". They call it "daemon" on *nix, I believe.
That service could then expose an API via pipes, sockets, or a shared memory region where the unelevated, user-mode database tool could get and set files.
There's no way to completely prevent system administrators from accessing a file directly, so if that is a requirement, you're out of luck. On Windows administrators have a special privilege that allows them to take ownership of any securable item such as a file or directory. Once they're owners, they can do anything they want to the securable item. There's just no way around that.
I have implemented both approaches in different projects with different requirements and constraints. And I would strongly recommend to keep all the contents in the database, storing the media files in large blobs. Eventhough that will require very large tables, that should not be a problem for the latest versions of the most well known databases.
I recommend DB2. DB2 since version 9 supports very large tables. The maximum is monstrously large. 512000 petabytes, half a zettabyte.
You need to accept that the choice between storing the files in the database vs. in the file system ultimately doesn't matter much - in both scenarios they can be read trivially from outside, unless there is some encryption. That moves the problem from where to store the data to how to store the secret key to decrypt encrypted data, in your application.
This is a hard problem. There's probably nothing you can develop that can't be cracked by a determined attacker in a rather short time. It depends on the audience of your program whether that's a real concern; if it is, then you can't do much. It takes a single successful crack to access your data and make it accessible to all interested in it. The attacker will go for the weakest part, which isn't the hard-to-break file encryption, but your application.
Use a DB (firebird or -embedded) and keep large content out of it.
If it is encryption you are worrying about, do this at the filesystem level instead of the db level.
All modern OSes have support for encrypted filesystems
There are merits in both approaches.
If you want to have the files separately in the filesystem, I would encrypt each one individually. A password-protected RAR or ZIP file would be as good a method as any.
Use a different password/decryption key for each file, and store the password in the database along with each file name.
The following is suggested without knowing the specifics of your application.
As you mention that you can not protect against theft, just about the only option to make sure the multimedia files are safe (unusable) by anyone other than the "owner" you need to encrypt them using a cipher like AES or BlowFish and a secure secret Key. These algorithms are different from MD5 that you mention. MD5 is a HASH algorithm.
For Delphi a rather good encryption library is DCPcrypt forund at http://www.cityinthesky.co.uk/cryptography.html. If has both HASH and Cipher algorithms.
Your problem will be Cipher KEY Management, namely "What password to use for encryption?". The simplest solution without thinking about this would be to use the users own password, untill you realize that the user may change the password. If the user does that you need to Decrypt and ReEncrypt every single multimedia file associated with the user.
To answer the actual question about key management I'd suggest reading up on key management. As I'm no expert on cryptography I hope someone more versed in the crypto world can help here... Link: http://en.wikipedia.org/wiki/Key_management.
I think it all depends on a few things:
Where do the files come from initially
How will the files be accessed (over the network or locally)
Who do you want to protect the files from
If the initial files come from "you" (the developer) then encrypted database blobs would probably be the best way as most dbs' come with some form of encryption.
If the files come from the user of the software, then using the file system would suffice - possibly using an temporary directory to store and retrieve files if used over the network, but actually storing the files in a non-shared location so that network users don't have access to all the files.
Just a few thoughts.
Local file protection, web project protection, and secure email, Record-level security in databases (DB2, Oracle, Firebird, MS SQ:, My SLQ etc.). Creating encryption systems from primitives etc. Look for File Protect System - SE or SPE or PE.

Rails: Storing binary files in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).
This is a pretty standard design question, and there isn't really a "one true answer".
The rule of thumb I typically follow is "data goes in databases, files go in files".
Some of the considerations to keep in mind:
If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.
Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?
Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.
On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.
Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.
On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.
Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?
Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.
Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.
In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.
There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.
But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.
The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.
I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used.
But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done.
And adding a web server is no issue at all.
Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.
If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!
Unable to find an up-to-date answer to this question I have implemented an
database service for Active Storage available since Rails 5.2 that works just like any other Active Storage service, but stores file content in a special database column instead of a cloud service.
The implementation is based on a standard Rails Active Storage service, adding a migration with a new model: an extra table that stores blob contents in a binary field. The service creates and destroys records in this table as requested by Active Storage.
Therefore, this service, once installed, can be consumed via a standard Rails Active Storage API.
https://github.com/TitovDigital/activestorage-database-service
Please be aware of all pros and cons of using a database for storing files.
With the right database it will provide full ACID support and can wrap file storage and deletion into transactions. It is also much easier in DevOps as there is one less service to configure.
Large files or large traffic are the risky cases. Either will put an unnecessary strain on the app and database servers.

Resources