Keeping users uploaded documents private - ruby-on-rails

I believe this question is platform/technology independent, however I am using Ruby on Rails with the carrierwave gem.
Users upload documents to my site, and I need to keep them private. I am exploring the different options available to me, along with their advantages and disadvantages.
Option 1
Obfuscate urls to images to make them impossible to guess.
This would be relatively simple to implement and fast to serve up. However, if a url was made public by whatever means, security is lost.
Option 2
Have documents accessed through some sort of intermediate step that requires authentication. This would have improved security over option 1, but would place additional load on the server. A page containing previews of a number of uploaded documents would hammer the server.
Are there any other options available to me? Have I made any mistakes with my claims, or missed any important points?

I think the best option you have is to have a "key" for your documents. You can generate a key, with a certain lifetime, and when you go on /document/name/access_key, you find the record matching and return the file associated with the record. Never exposing the real URL.

Related

How to manage files uploaded by users?

I have a web app which uses nginx. Suppose it's in Rails, but it doesn't really matter.
I'm planning to have around a hundred of pictures/files uploaded every day by users. I don't want to use any custom solution for storying and uploading images and instead I want to manage that myself.
1) Is there an idiomatic place/path where I should store those images? Or will any path within the reach of nginx will work?
2) Should it be a separate folder from the one that I'm use for storying images for CSS?
3) How would I organize folders forest? That is, should it be something like /my_base_image_folder/{year}/{month}/{day}/{image_sequence_number}.jpg?
Or maybe /my_base_image_folder/{article_id}/{image_sequence_number}.jpg? Or should I put them in the same folder `/my_base_image_folder/{img_guid}.jpg?
And why?
4) What's a recommended solution for naming uploading files? GUID? Or a sequence number?
It all depends on your use case for consuming those images again. If you're planning to retrieve images uploaded by the users for those users on later sessions, ie., if your users have to browse through their images in your application, it's better to store it in a_public_folder/user_id_hashed/. If you're planning to retrieve images based on the uploaded time or date, it's better if you go with a_public_folder/year/month/day/... Make sure the public folder is accessible by your ngnix by using not-too-open permission, just to be safe. Also, naming the image files, a combination of timestamp and a small random hex should do I guess, no big deal.

Storage of user data

When looking at how websites such as Facebook stores profile images, the URLs seem to use randomly generated value. For example, Google's Facebook page's profile picture page has the following URL:
https://scontent-lhr3-1.xx.fbcdn.net/hprofile-xft1/v/t1.0-1/p160x160/11990418_442606765926870_215300303224956260_n.png?oh=28cb5dd4717b7174eed44ca5279a2e37&oe=579938A8
However why not just organise it like so:
https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png
Clearly this would be much easier in terms of storage and simplicity. Am I missing something? Thanks.
Companies like Facebook have fairly intense CDNs. They may look like randomly generated urls but they aren't, each individual route is on purpose and programed to be handled in that manner.
They aren't after simplicity of storage like you would be if you were just using a FTP to connect to a basic marketing website server. While you may put all your images in a /images folder, Facebook is much too complex for this. Dozens of different types of applications accessing hundreds if not thousands of CDNs and servers world wide.
If you ever build a web app, such as a Ruby on Rails app, and you work with a services such as AWS (Amazon Web Services) you'll also encounter what seems like nonsensical urls. But it's all part of the fast delivery network provided within the architecture. Every time you "push" your app up to the server new urls are generated for each unique resource automatically, css files, JavaScript files, image files, etc all dynamically created. You don't have to type in each of these unique urls individually each time you publish the app, the code simply knows where to look for those as a part of the publishing process.
Example: you tell the web app to look for
//= require jquery
and it returns you http://example.com/assets/jquery-eb3e278249152b5b5d5170b73d9dbf52.js?body=1 in your header.
It doesn't matter that the url is more complex than it should be, the application recognizes it, and that's all that matters.
Simply put, I think it can boil down to two main reasons: Security and Cache:
Security - Adding these long unpredictable hashes prevent others from guessing photo URLs and makes it pretty hard to download photos you aren't supposed to.
Consider what would happen if I could easily guess your profile photo URL and download it, even when you explicitly chose to share it only with friends.
Cache - by adding "random" query params to each photo, you make sure each photo instance gets its own URL. Thus you can store the photo in browser's cache for a long time, knowing that whenever you replace it with a new one, the new photo will have a fresh URL and the browser won't keep showing you the old photo.
If you were to keep the same URL for each user's profile photo (e.g. https://scontent-lhr3-1.xx.fbcdn.net/{{ profile_id }}/50x50.png), and then upload a new photo, either one of these can happen:
If you stored the photo in browser's cache for a long time, the browser will keep showing you the cached version (as long as URL is the same, and cache hasn't expired, there's no need to re-download the image).
If, instead, you only keep the image in cache for short period of time, you end up hitting your server much more then actually needed, increasing the load and hurting performance.
I hope this clarifies it.
With your route scheme, how would you avoid strangers to access the pictures of a private account? The hash also prevent bots to downloads all the pictures.
I get your pain :-) I might not stay with describing how this problem could appear more, but rather let me speak of a solution. Well it is normal that in general code while dealing with hashed value or even base64ed value it seems likes mess to deal with, but with an identifier to explain along, it does not remain much!
I use to work in a company where we use to collate Facebook post, using Graph API get its Insights Object and extract information from it for easy passing around within UI and sending back to our Redis cache store; and once we defined a data-structure in TaffyDB how an object organization is going to look like, everything just made sense with its ability to query the useful finite from long junk looking stream of minified Javascript stream
Refer: http://www.taffydb.com/
The extra values in the URL are useful to:
Track access. This is like when a newspaper appends "&homepage" vs. "&email" to an article URL, so their system knows how a reader found the page.
Avoid abuse and control access. Imagine that a user loaded a small, popular pornographic image into a profile image. They could then hijack the CDN to be a free web host for their porn site. But that code is used internally by the CDN to limit the number of views.

ASP.NET MVC - Profile Image / File Organising

I'm currently building into my app a method to allow users to provide images for their profile.
As part of the upload process I'll be creating a couple of different versions of the file for use in various different places in the system.
I am not going to store the images in a database as that just doesn't make sense to me. They will be served up by IIS and I also want to structure it in such a way that it will make migrating it to a CDN in the future a lot easier.
My current plan is to assign each user a GUID or similar unique value that will be stored in the database as part of their profile information, so something like:
ProfileId 1 - UserCode 628B3AF30B0F48EA8D61778084FC73C3
When a user uploads their profile image (or other data) I'll use the code to create a directory on the web server, for instance:
"~/userimages/628B3AF30B0F48EA8D61778084FC73C3/"
Then I'll store a few images in here like:
"~/userimages/628B3AF30B0F48EA8D61778084FC73C3/profilephoto160x160.jpg"
"~/userimages/628B3AF30B0F48EA8D61778084FC73C3/profilephoto25x25.jpg"
This should then make the process of recreating valid urls to the correct images both not hard coded anywhere nasty (like a profile image url in the database) and should make it a predictable process, something like:
var profileImgurl = BaseImageFolder + profile.UserCode + "profilephoto160x160.jpg"
I have no experience of CDN and was wondering if I'm creating any problems I'll struggle to solve later down the line when/if it gets migrated?
Does anyone see any nasty problems with this approach that I haven't thought of?
As far as I can tell people like google have a similar approach to this.

Letting visitors try out user-only features that write to DB

My site lets people create database entries (which most rails apps do), and I realized that there's a huge drop-off from landing on the site to actually signing up to try it. Basically the service lets users build their own document by combining different components. I'm thinking about adding an interface where visitors who are not yet registered can try out the features (building stuff) and ask them to sign up at the last stage, when they're about to publish their document.
First thing that comes to mind is use HTML5 local storage, but then another idea came to mind: maybe I could create a temporary user whenever a visitor tries out the features, and then later remove them from the database if they don't sign in. I'm not sure if this is safe, but this seems like it might be easier than dealing with all the local storage issues.
What would be the best practice for this type of situation?
HTML5 storage would be an option, tho most likely a lot of client side coding.
Other options would be to have a duplicate table of these 'demo' documents which you can clear every now and again for users that did not sign up. You could also just store the document in the user session, as you don't need it permanently stored, and then store it in the database once they have signed up.

Accessing Word documents in a Rails app

I have a number of documents (mainly Word and Excel) that I'd like to make available to users of my Rails app. However, I've never tried something like this before and was wondering what the best way to do this was? Seeing as there will only be a small number of Word documents, and all will be uploaded by me, do I just store them somewhere in my Rails app (i.e. public/docs or similar) or should I set up a separate FTP and link to that? Perhaps there's an even better way of doing this?
If they're to be publically accessable, you definitely just want to stick them in public somewhere. Write a little helper to generate the URL for you based on however you want to refer to them in your app, for cleanliness (and so if you do change the URL later, for example to bucket your files to keep your directory sizes under control, you don't have to change links all over your app, just in one place.
If, on the other hand, your files are only for logged-in users, you'll need to use something like send_file to do the job, or one of the webserver-specific methods like the X-Sendfile header to check the user is authorised to view the file before sending it back to them.
I would do as you suggested and put them in public/docs. If you are planning on making an overview/index page for the files and link directly to them it would be easier if they were stored locally instead of a remote FTP server. However, since you are the one who will be uploading and maintaining these files, I think you should go with the option that's easiest for you.

Resources