How can I accept large file uploads of around 250mb?

How can I accept large file uploads of around 250mb? - upload

How can I accept large file uploads of around 250mb ?
http://dropitto.me/ seems ok, but it only allows up to 75MB uploads, it requires your actual dropbox account password, and it does not use HTTPS for authentication - so a few red flags there.
I have a Dropbox Pro account and EC2 and S3 resources. I'm looking for a method to allow non-technical users to send files between 100 - 250mb.
I'm not crazy about using FTP because I think it might be too technical for some users to set up. One option might be to ask them to register for a dropbox.com account and install the client and share a folder. Or share a folder with them to initiate the process.
But the real reason I'm asking this on StackOverflow is because I hope that there is some library that is really useful for doing this kind of stuff - and would be fast for the end users since the backbone could be "on the cloud". I don't really care what language it's written in.
Also let me just say that I'm not crazy about the idea of using rapidshare.com or megaupload.com or a service like that, but let me know if you would support those as the solution.

Check out http://kicksend.com - they allow up to a certain amount in the browser, but they have applications for mac/windows which allow for basically unlimited file transfer very easily.

Related

Web application security concerns with user-uploaded files on Amazon S3?

Background
I have a web application where users may upload a wide variety of files (e.g. jpg, png, r, csv, epub, pdf, docx, nb, tex, etc). Currently, we whitelist exactly which files types are a user may upload. This limitation is sometimes annoying for users (i.e. because they must zip disallowed files, then upload the zip) and for us (i.e. because users write support asking for additional file types to be whitelisted).
Ideal Solution
Ideally, I'd love to whitelist files more aggressively. Specifically, I'd would like to (1) figure out which file types may be trusted and (2) whitelist them all. Having a larger whitelist would be more convienient for users and it reduce the number of support tickets (if only very slightly).
What I Know
I've done a few hours of research and have identified common problems (e.g. path traversal, placing assets in root directory, htaccess vulnerabilities, failure to validate mime type, etc). While this research has been interesting, my understanding is that many of these issues are moot (or considerably mitigated) if your assets are stored on Amazon S3 (or a similar cloud storage service) – which is how most modern web application manage user-uploaded files.
Hasn't this question already been asked a zillion times?!
Please don't mistake this as a general "What are the security risks of user-uploaded content?" question. There are already many questions like that and I don't want to rehash that discussion here.
More specifically, my question is, "What risks, if any, exist given a conventional / modern web application setup?" In other words, I don't care about some old PHP app or vulnerabilities related to IE6. What should I be worried about assuming files are stored in a cloud service like AmazonS3?
Context about infrastructure / architecture
So... To answer that, you'll probably need more context about my setup. That said, I suspect this is a relatively common setup and therefore hope the answers will be broadly useful to anyone writing a modern web application.
My stack
Ruby on Rails application, hosted on Heroku
Users may upload a variety of files (via Paperclip)
Server validates both mime type and extension (against a whitelist)
Files are stored on Amazon S3 (with varying ACL permissions)
When a user uploads a file...
I upload the file directly on AS3 in a tmp folder (hasn't touched my server yet)
My server then downloads the file from the tmp folder on AS3.
Paperclip runs validations and executes any processing (e.g. cutting thumbnails of images)
Finally, Paperclip places the file(s) back on AS3 in their new, permanent location.
When a user downloads a file...
User clicks to download a file which sends a request to my API (e.g. /api/atricle/123/download)
Internally, my API reads the file from AS3 and then serves it to the user (as content type attachment)
Thus the file does briefly pass through my server (i.e. not merely a redirect)
From the user's perspective, the file is served from my API (i.e. the user has no idea the file live on AS3)
Questions
Given this setup, is it safe to whitelist a wide range of file types?
Are there some types of files that are always best avoided (e.g. JS files)?
Are there any glaring flaws in my setup? I suspect not, but if so, please alert me!

Best way to serve files?

I'm a novice web developer with some background in programming (mostly Python).
I'm looking for some basic advice on choosing the right technology.
I need to serve files over the internet (mp3's), but I need to implement some
control on the access:
1. Files will be accessible only for authorized users.
2. I need to keep track on how many times a file was loaded, by whom, etc.
What might be the best technology to implement this? That is, should I
learn Apache, or maybe Django? or maybe something else?
I'm looking for a 'pointer' in the right direction.
Thank!
R

If you need to track/control the downloads that suggests that the MP3 urls need to be routed through a Rails controller. Very doable. At that point you can run your checks, track your stats, and send the file back.
If it's a lot of MP3's, you would like to not have Rails do the actual sending of the MP3 data as it's a waste of it's time and ties up an instance. Look into xsendfile where Rails can send a response header indicating the file path to send and apache will intercept it and do the actual sending.
https://tn123.org/mod_xsendfile/
http://rack.rubyforge.org/doc/classes/Rack/Sendfile.html

You could use Django and Lighttpd as a web server. With Lighttpd you can use mod_secdownload, wich enables you to generate one time only urls.
More info can be found here: http://redmine.lighttpd.net/projects/1/wiki/Docs_ModSecDownload
You can check for permissions in your Django (or any other) app and then redirect the user to this disposable URL if he passed the permission check.

Secure keys in iOS App scenario, is it safe?

I am trying to hide 2 secrets that I am using in one of my apps.
As I understand the keychain is a good place but I can not add them before I submit the app.
I thought about this scenario -
Pre seed the secrets in my app's CoreData Database by spreading them in other entities to obscure them. (I already have a seed DB in that app).
As the app launches for the first time, generate and move the keys to the keychain.
Delete the records from CoreData.
Is that safe or can the hacker see this happening and get those keys?
*THIRD EDIT**
Sorry for not explaining this scenario from the beginning - The App has many levels, each level contains files (audio, video, images). The user can purchase a level (IAP) and after the purchase is completed I need to download the files to his device.
For iOS6 the files are stored with Apple new "Hosted Content" feature. For iOS5 the files are stored in amazon S3.
So in all this process I have 2 keys:
1. IAP key, for verifying the purchase at Apple IAP.
2. S3 keys, for getting the files from S3 for iOS5 users:
NSString *secretAccessKey = #"xxxxxxxxx";
NSString *accessKey = #"xxxxxxxxx";
Do I need to protect those keys at all? I am afraid that people will be able to get the files from S3 with out purchasing the levels. Or that hackers will be able to build a hacked version with all the levels pre-downloaded inside.

Let me try to break down your question to multiple subquestions/assumption:
Assumptions:
a) Keychain is safe place
Actually, it's not that safe. If your application is installed on jailbroked device, a hacker will be able to get your keys from the keychain
Questions:
a) Is there a way to put some key into an app (binary which is delivered form AppStore) and be completely secure?
Short answer is NO. As soon as there is something in your binary, it could be reverse engineered.
b) Will obfuscation help?
Yes. It will increase time for a hacker to figure it out. If the keys which you have in app will "cost" less than a time spend on reverse engineering - generally speaking, you are good.
However, in most cases, security through obscurity is bad practice, It gives you a feeling that you are secure, but you aren't.
So, this could be one of security measures, but you need to have other security measures in place too.
c) What should I do in such case?*
It's hard to give you a good solution without knowing background what you are trying to do.
As example, why everybody should have access to the same Amazon S3? Do they need to read-only or write (as pointed out by Kendall Helmstetter Gein).
I believe one of the most secure scenarios would be something like that:
Your application should be passcode protected
First time you enter your application it requests a user to authenticate (enter his username, password) to the server
This authenticates against your server or other authentication provider (e.g. Google)
The server sends some authentication token to a device (quite often it's some type of cookie).
You encrypt this token based on hash of your application passcode and save it in keychain in this form
And now you can do one of two things:
hand over specific keys from the server to the client (so each client will have their own keys) and encrypt them with the hash of your application passcode
handle all operation with S3 on the server (and require client to send)
This way your protect from multiple possible attacks.
c) Whoooa.... I don't plan to implement all of this stuff which you just wrote, because it will take me months. Is there anything simpler?
I think it would be useful, if you have one set of keys per client.
If even this is too much then download encrypted keys from the server and save them in encrypted form on the device and have decryption key hardcoded into your app. I would say it's minimally invasive and at least your binary doesn't have keys in it.
P.S. Both Kendall and Rob are right.
Update 1 (based on new info)
First of all, have you seen in app purchase programming guide.
There is very good drawing under Server Product Model. This model protects against somebody who didn't buy new levels. There will be no amazon keys embedded in your application and your server side will hand over levels when it will receive receipt of purchase.
There is no perfect solution to protect against somebody who purchased the content (and decided to rip it off from your application), because at the end of days your application will have the content downloaded to a device and will need it in plain (unencrypted form) at some point of time.
If you are really concerned about this case, I would recommend to encrypt all your assets and hand over it in encrypted form from the server together with encryption key. Encryption key should be generated per client and asset should be encrypted using it.
This won't stop any advanced hacker, but at least it will protect from somebody using iExplorer and just copying files (since they will be encrypted).
Update 2
One more thing regarding update 1. You should store files unencrypted and store encryption key somewhere (e.g. in keychain).
In case your game requires internet connection, the best idea is to not store encryption key on the device at all. You can get it from the server each time when your app is started.

DO NOT store an S3 key used for write in your app! In short order someone sniffing traffic will see the write call to S3, in shorter order they will find that key and do whatever they like.
The ONLY way an application can write content to S3 with any degree of security is by going through a server you control.
If it's a key used for read-only use, meaning your S3 cannot be read publicly but the key can be used for read-only access with no ability to write, then you could embed it in the application but anyone wanting to can pull it out.
To lightly obscure pre-loaded sensitive data you could encrypt it in a file and the app can read it in to memory and decrypt before storing in the keychain. Again, someone will be able to get to these keys so it better not matter much if they can.
Edit:
Based on new information you are probably better off just embedding the secrets in code. Using a tool like iExplorer a causal user can easily get to a core data database or anything else in your application bundle, but object files are somewhat encrypted. If they have a jailbroken device they can easily get the un-encrypted versions but it still can be hard to find meaningful strings, perhaps store them in two parts and re-assemble in code.
Again it will not stop a determined hacker but it's enough to keep most people out.
You might want to also add some code that would attempt to ask your server if there's any override secrets it can download. That way if the secrets are leaked you could quickly react to it by changing the secrets used for your app, while shutting out anyone using a copied secret. To start with there would be no override to download. You don't want to have to wait for an application update to be able to use new keys.

There is no good way to hide a secret in a piece of code you send your attacker. As with most things of this type, you need to focus more on how to mitigate the problem when the key does leak rather than spend unbounded time trying to protect it. For instance, generating different keys for each user allows you to disable a key if it is being used abusively. Or working through a intermediary server allows you to control the protocol (i.e. the server has the key and is only willing to do certain things with it).
It is not a waste of time to do a little obfuscating. That's fine. But don't spend a lot of time on it. If it's in the program and it's highly valuable, then it will be hacked out. Focus on how to detect when that happens, and how to recover when it does. And as much as possible, move that kind of sensitive data into some other server that you control.

Rails implementation for securing S3 documents

I would like to protect my s3 documents behind by rails app such that if I go to:
www.myapp.com/attachment/5 that should authenticate the user prior to displaying/downloading the document.
I have read similar questions on stackoverflow but I'm not sure I've seen any good conclusions.
From what I have read there are several things you can do to "protect" your S3 documents.
1) Obfuscate the URL. I have done this. I think this is a good thing to do so no one can guess the URL. For example it would be easy to "walk" the URL's if your S3 URLs are obvious: https://s3.amazonaws.com/myapp.com/attachments/1/document.doc. Having a URL such as:
https://s3.amazonaws.com/myapp.com/7ca/6ab/c9d/db2/727/f14/document.doc seems much better.
This is great to do but doesn't resolve the issue of passing around URLs via email or websites.
2) Use an expiring URL as shown here: Rails 3, paperclip + S3 - Howto Store for an Instance and Protect Access
For me, however this is not a great solution because the URL is exposed (even for just a short period of time) and another user could perhaps in time reuse the URL quickly. You have to adjust the time to allow for the download without providing too much time for copying. It just seems like the wrong solution.
3) Proxy the document download via the app. At first I tried to just use send_file: http://www.therailsway.com/2009/2/22/file-downloads-done-right but the problem is that these files can only be static/local files on your server and not served via another site (S3/AWS). I can however use send_data and load the document into my app and immediately serve the document to the user. The problem with this solution is obvious - twice the bandwidth and twice the time (to load the document to my app and then back to the user).
I'm looking for a solution that provides the full security of #3 but does not require the additional bandwidth and time for loading. It looks like Basecamp is "protecting" documents behind their app (via authentication) and I assume other sites are doing something similar but I don't think they are using my #3 solution.
Suggestions would be greatly appreciated.
UPDATE:
I went with a 4th solution:
4) Use amazon bucket policies to control access to the files based on referrer:
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingBucketPolicies.html
UPDATE AGAIN:
Well #4 can easily be worked around via a browsers developer's tool. So I'm still in search of a solid solution.

You'd want to do two things:
Make the bucket and all objects inside it private. The naming convention doesn't actually matter, the simpler the better.
Generate signed URLs, and redirect to them from your application. This way, your app can check if the user is authenticated and authorized, and then generate a new signed URL and redirect them to it using a 301 HTTP Status code. This means that the file will never go through your servers, so there's no load or bandwidth on you. Here's the docs to presign a GET_OBJECT request:
https://docs.aws.amazon.com/sdk-for-ruby/v3/api/Aws/S3/Presigner.html

I would vote for number 3 it is the only truly secure approach. Because once you pass the user to the S3 URL that is valid till its expiration time. A crafty user could use that hole the only question is, will that affect your application?
Perhaps you could set the expire time to be lower which would minimise the risk?
Take a look at an excerpt from this post:
Accessing private objects from a browser
All private objects are accessible via
an authenticated GET request to the S3
servers. You can generate an
authenticated url for an object like
this:
S3Object.url_for('beluga_baby.jpg', 'marcel_molina')
By default
authenticated urls expire 5 minutes
after they were generated.
Expiration options can be specified
either with an absolute time since the
epoch with the :expires options, or
with a number of seconds relative to
now with the :expires_in options:

I have been in the process of trying to do something similar for quite sometime now. If you dont want to use the bandwidth twice, then the only way that this is possible is to allow S3 to do it. Now I am totally with you about the exposed URL. Were you able to come up with any alternative?
I found something that might be useful in this regard - http://docs.aws.amazon.com/AmazonS3/latest/dev/AuthUsingTempFederationTokenRuby.html
Once a user logs in, an aws session with his IP as a part of the aws policy should be created and then this can be used to generate the signed urls. So in case, somebody else grabs the URL the signature will not match since the source of the request will be a different IP. Let me know if this makes sense and is secure enough.

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.

Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST

If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.

Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart