Web application security concerns with user-uploaded files on Amazon S3? - ruby-on-rails

Background
I have a web application where users may upload a wide variety of files (e.g. jpg, png, r, csv, epub, pdf, docx, nb, tex, etc). Currently, we whitelist exactly which files types are a user may upload. This limitation is sometimes annoying for users (i.e. because they must zip disallowed files, then upload the zip) and for us (i.e. because users write support asking for additional file types to be whitelisted).
Ideal Solution
Ideally, I'd love to whitelist files more aggressively. Specifically, I'd would like to (1) figure out which file types may be trusted and (2) whitelist them all. Having a larger whitelist would be more convienient for users and it reduce the number of support tickets (if only very slightly).
What I Know
I've done a few hours of research and have identified common problems (e.g. path traversal, placing assets in root directory, htaccess vulnerabilities, failure to validate mime type, etc). While this research has been interesting, my understanding is that many of these issues are moot (or considerably mitigated) if your assets are stored on Amazon S3 (or a similar cloud storage service) – which is how most modern web application manage user-uploaded files.
Hasn't this question already been asked a zillion times?!
Please don't mistake this as a general "What are the security risks of user-uploaded content?" question. There are already many questions like that and I don't want to rehash that discussion here.
More specifically, my question is, "What risks, if any, exist given a conventional / modern web application setup?" In other words, I don't care about some old PHP app or vulnerabilities related to IE6. What should I be worried about assuming files are stored in a cloud service like AmazonS3?
Context about infrastructure / architecture
So... To answer that, you'll probably need more context about my setup. That said, I suspect this is a relatively common setup and therefore hope the answers will be broadly useful to anyone writing a modern web application.
My stack
Ruby on Rails application, hosted on Heroku
Users may upload a variety of files (via Paperclip)
Server validates both mime type and extension (against a whitelist)
Files are stored on Amazon S3 (with varying ACL permissions)
When a user uploads a file...
I upload the file directly on AS3 in a tmp folder (hasn't touched my server yet)
My server then downloads the file from the tmp folder on AS3.
Paperclip runs validations and executes any processing (e.g. cutting thumbnails of images)
Finally, Paperclip places the file(s) back on AS3 in their new, permanent location.
When a user downloads a file...
User clicks to download a file which sends a request to my API (e.g. /api/atricle/123/download)
Internally, my API reads the file from AS3 and then serves it to the user (as content type attachment)
Thus the file does briefly pass through my server (i.e. not merely a redirect)
From the user's perspective, the file is served from my API (i.e. the user has no idea the file live on AS3)
Questions
Given this setup, is it safe to whitelist a wide range of file types?
Are there some types of files that are always best avoided (e.g. JS files)?
Are there any glaring flaws in my setup? I suspect not, but if so, please alert me!

Related

File access control in Rails

I have a web application which allows users to upload files and share them with other people across the internet. Anyone who has access can download the files, but if the uploader doesn't specifically share the file with someone else, that person can't download the files.
Since the user permissions are controlled by rails, each time someone tries to download a file it sent to the user from a rails process. This is a serious bottle neck - rails is needed for the file upload and permissions but it shouldn't be in the way taking up memory just for others to download files.
I would like to split the application on different servers for the frontend, database and file server. If the user does to my site, they should have the ability to download the file directly from something like my-fileserver.domain.com/file/38183 instead of running it through rails.
What is the best option for this? I would like to control file access at the database level, not the file system - but I don't want rails taking up all of the memory on my system for such a simple process. Any ideas?
Edit:
One thing I may be able to do is load a list of files/permissions from mysql into a node.js app and give access rights to the file server as a true/false response based on what the file server sends in. This still requires the file server to run a web server, however.
May be You could generator a rand url for file, and control by center system .

How can I accept large file uploads of around 250mb?

How can I accept large file uploads of around 250mb ?
http://dropitto.me/ seems ok, but it only allows up to 75MB uploads, it requires your actual dropbox account password, and it does not use HTTPS for authentication - so a few red flags there.
I have a Dropbox Pro account and EC2 and S3 resources. I'm looking for a method to allow non-technical users to send files between 100 - 250mb.
I'm not crazy about using FTP because I think it might be too technical for some users to set up. One option might be to ask them to register for a dropbox.com account and install the client and share a folder. Or share a folder with them to initiate the process.
But the real reason I'm asking this on StackOverflow is because I hope that there is some library that is really useful for doing this kind of stuff - and would be fast for the end users since the backbone could be "on the cloud". I don't really care what language it's written in.
Also let me just say that I'm not crazy about the idea of using rapidshare.com or megaupload.com or a service like that, but let me know if you would support those as the solution.
Check out http://kicksend.com - they allow up to a certain amount in the browser, but they have applications for mac/windows which allow for basically unlimited file transfer very easily.

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

mod_xsendfile alternatives for a shared hosting service without it

I'm trying to log download statistics for .pdfs and .zips (5-25MB) in a rails app that I'm currently developing and I just hit a brick wall; I found out our shared hosting provider doesn't support mod_xsendfile. The sources I've read state that without this, multiple downloads could potentially cause a DoS issue—something I'm definitely trying to avoid. I'm wondering if there are any alternatives to this method of serving files through rails?
Well, how sensitive are the files you're storing?
If you hosted these files somewhere under your app's /public directory, you could just do a meta tag or javascript redirect to the public-facing URL of these files after your users hit some sort of controller action that will update your download statistics.
In this case, your users would probably need to get one of those "Your download should commence in a few moments" pages before the browser would start the file download.
Under this scenario, your Rails application won't be streaming the file out, your web server will, which will give you the same effect as xsendfile. On the other hand, this won't work very well if you need to control access to those downloadable files.

Using a CDN to store/serve user image uploads?

I'm still new to the whole CDN ideaology, so this might be a stupid question but I'm sure someone can shed some light on this. I've got a basic php script that takes user image uploads, resizes them, creates a directory ($user_id), and stores the finished product in the directory (like www.mysite.com/uploads/$user_id/image1.jpg). Works like a charm.
I just got all the hosting stuff squared away and I'm using the Rackspace (Slicehost?) "Cloud Server" architecture. I also signed up for the Rackspace (Mosso?) "Cloud Files". So far so good.
So my question is: Should I be storing the images that users upload locally (on my apache server) or as objects via Cloud Files? It seems like a great idea to separate the static content from my web server so I can just use it to generate the dynamic content. But would it be a lot of overhead to create a CDN-enabled Container each time a user uploads an image?
Hopefully I'm not missing the boat on this one totally. I can't seem to find a whole lot of info about this, but I'm sure there is a good reason why I should either pursue or avoid this idea. Any suggestions are greatly appreciated!
I am not familiar with Rackspace's offering, but the general logic behind using a CDN for static content is to achieve two goals:
offload the bandwidth and processing
to other servers, freeing up yours.
move the requests off to the client
Move the large static content closer
to the client
When you send the generated HTML to the browser, it will "see" the images as www.yourdomain.com/my_image.jpg for example, and perform additional requests for each piece of static content, potentially starving your server of threads to service requests. If you move all this content onto a CDN, the browser would see something like cdn.yourdomain.com, and the browser will request the images from the CDN, thus allowing your server to service other requests instead. Additionally, most CDN's distribute your content to multiple locations and have geographic routing for requests to serve the content from the closest possible location, improving the perceived load time for clients.

Resources