Proper Amazon AWS S3 usage - ruby-on-rails

Heres a quick rundown. We have two apps. Both apps are different from each other in every aspect.
Our first app is a company profile (4 page layout: home, products, about, contact). I don't think it will generate the same traffic as social web sites online. It allows staff members to post product photos. Is it better to have an AWS S3 account to store this content on? Or would I be better off storing the files on the local web server?
Our second app is more social oriented. Here, we have decided to use an S3 bucket for this app. Since we already have an AWS account, should we create a bucket for the first app on this AWS account? Or will this just render more expenses in the long run? What are your thoughts?
On another note. Should app related content (logo, icons, background images, buttons, etc) be stored on the S3 account or on the local server? What is the general consensus on this?

I would keep anything that is "hard-coded" into your site, IE the images in your CSS and anything in your HTML/ERB/HAML views that are referenced by their file name as part of the web server itself. You don't have to do this, but one reason to do it is if there's any interruption of Amazon S3 or any kind of configuration issue etc at any point in the future these 'basic' visual resources won't be affected, so your site will still look pretty much intact.
Plus, this kind of content has a specific appropriate place to live (your public folder) and rails helpers that make it really easy to access.
I just can't see a reason to mess with S3 if it's not going to be high volume of data and bandwidth.
On the other hand, if it IS going to be very high bandwidth and you're using a cheap hosting platform that doesn't give you tons of bandwidth, moving those images to S3 would save you some. But are you really in danger of exceeding your bandwidth limits on the server?
Lastly, I think your OTHER app is using S3 as expected, that's a good use of S3. I would not necessarily recommend mixing buckets with other apps, though. The simplest reason for this is, let's say one of your apps grows and at some point merges with, partners with, or is sold to another party. Really, imagine any scenario where another interest becomes involved in either app.
Now you've got no way to maintain separation of access between the resources your two different apps are using. So you have to either migrate to a new platform for one of them, or you have to deal with them being 'joined at the hip' so to speak.
In other words, I just don't see an upside to sharing AWS resources between two apps that aren't directly related to each other. No reason to make them a package deal if they aren't naturally a package deal.
On the other hand if it DOES make sense for them to be totally linked, then it doesn't really matter either way.
Hope that helps you think about it. Good luck!

s3 is ridiculously cheap so I'd put any image uploads on there.
In the past I've put static images, stylesheets and javascripts and so forth on s3 but finally I just leave them on the web server. Reason for this is jammit which is oh sooo cool in handling packaging, updating, compressing, etc, etc of assets.

Related

How videos are stored on web server these days?

I'm building a web app that need to store some resources, including but not limited to articles, pictures and videos. My question here is how videos (mp4/ogg) are stored on web server? just as bare file or as binaries in relational or nosql db?
The question to BLOB data almost always comes down to "don't BLOB data". There are very few times that make more sense to write a database connector for your data then to just keep it on disk.
The general trend is to use an established service that employs good design patterns, such as Paperclip for ruby, and tailor it to your needs.
Using an external storage service is also a good idea, for example Amazon S3 will store all of your data for pennies on the dollar per gigabyte, and they'll do an excellent job of it.
If you do decide to cook up your own server that handles data internally, might I recommend digital ocean? I have been very happy with the SSD servers I have setup there (which are super fast).
For video you will almost certainly need a webserver that is capable of streaming the file. I think Nginx has this feature.
I think you need to elaborate a bit about the use case you wish to implement for this app. Only then you can have precise answer.
And to to help out with that, here are some questions you need to ponder:
1- You said you wanted to store videos, what are your requirements beyond storage?
2- do you wish for example to offer access to third party users to these videos and search with keywords?
3- If yes, what kind of information is available about the videos? what is the expected average size of these files?
Many database engines offer the possibility of storing big binary files, but that comes with an impact on performance. That's why most of the storage systems that deal with big files, store the files themselves on the disk and any related metadata (file name, last updated, associated keywords, etc.) are stored in the database. That makes for a scalable system.
I'll edit this answer, if you find it useful and have further related-questions.
An unlimited file storage is difficult to setup without AWS S3. S3 is cheap and scalable solution but expensive to use without proper caching, so we have Nginx S3 proxy that works well: https://stackoverflow.com/a/44749584/290338

A place to store file ( Ruby on Rails )

I'm new to Rails, I wanted to make a website for uploading files: music, videos, pictures, text. What is a better way to store files? I've read about different methods: Database, as a file, Amazon S3?
There will be a lot of files around 1 kb to 20Mb each.
Thanks!
Storing files in a database is not bad, per se. It depends on the kind of database.
Storing files in a relational database is not view as a good practice by the reasons explained by bassneck.
But there are other kind databases that are specifically designed to store any kind of data, in a non-relational way, for example files of any kind. The answer of Dhruva highlight that, MongoDB is pretty good and its support for storing files using GridFS is awesome.
GridFS is very good, for example it can stream only parts of a file, pretty useful for video.
In your specific case – many small files of many kinds of data – GridFS is an real option. I use Heroku & mongohq.com and they work like a charm.
I don't think there is a right answer for that. I use heroku, so my only option is Amazon S3 (which has free quotas for the first year) and I'm pretty satisfied with it. I use carrierwave gem for uploading files and it's really easy to use. I really prefer it over Paperclip.
If your hosting provides a lot of space and bandwidth then you could give it a try.
But for the database, I really don't like the idea of storing files in it.
Updated
Reading a filename from the table should be faster than reading the file itself. The bigger the DB, the longer it will take to make a backup. And if you take heroku for example, you only get 20gb for a shared db. That's not much if you're gonna store 10-20mb files in there. Moving project to another hoster might be easier if you store files in the DB. But if you use an external service (such as S3), there would be no difference for you.
Amazon S3 integration on Heroku is relatively easy to setup and get working with the paperclip gem. Heroku has some documentation on how to get this up and running.
Take a look at their documentation and see if this is the kind of thing your looking for.
http://devcenter.heroku.com/articles/s3
I will suggest you to also check out & consider MongoDB GridFS - http://www.mongodb.org/display/DOCS/GridFS+Specification. My experience with GridFS has been good. I cannot give you a comparative analysis with the other options, however I would like to know the same.
I would recommend you use amazon s3 to store your files. It is the best.

How do I allow safely and inexpensively allow images on my site?

I have developed a social networking site for gardeners website, and am interested in giving users the ability to add images to their "tweets".
If I allow them to upload images to the actual site, it seems like this will quickly become expensive (this is a side project, not funded by anyone than myself and my own obsessions). Let's say the site becomes moderately popular, with 100K users posting one image a week, of only 250K in size. That's (100000 * .1 * 52 / 1024) = 508 MB/year in storage (and that doesn't take into account increased bandwidth). Plus I'd have to increase the server load to scale the images. I'm not sure if I should just go ahead with this, or if there are better possibilities.
Linking to other sites seems better in some ways. You do have broken links, but a larger concern for me is security: XSS.
The application is on Rails 3, using MongoDB / Mongoid as the backend, if that matters.
I'm looking for solutions such as:
APIs that store images on external sites. What would be ideal is the ability to upload it to my site, and make an API call to store it on an external site.
APIs (perhaps Javascript APIs) that make it easy to link to one or more external image hosting sites securely.
Markdown or similar markup that allow linking to external images securely. I am interested in giving users the ability to format their posts in limited ways, so this might solve two problems at the same time. I notice that this is what Stack Overflow does.
Security libraries that whitelist image URL patterns
Advice on why I am thinking about this problem wrong. For example, maybe I should just store the images. 500MB a year is really not all that expensive, and it does allow me to create a very clean user experience.
My objectives are (in order):
- Secure, both for my own site, and to not allow XSS attacks against other sites
- Best possible user experience
- Easy to maintain and implement
What have you done to allow user-supplied images on your site?
You're thinking about the problem wrong ;) or rather not at the right time.
Don't worry about the bandwidth now, when you don't have that many users yet. Concentrate on making the site user friendly and popular first. Performance, bandwidth, disk space - these are the things you'll work on when they become problems. By the time you've 100k users the cost of buying that space and bandwidth on, say, Amazon S3 may not be an issue anymore.
Why not using a service like Amazon s3? Is cheap, very cheap (With the Reduced Redundancy Storage), and the most important plugins like Paperclip support it out of the box...
You will need to look at the T&C of picture hosts (flickr etc...) and see if your usage is applicable. Flickr has an API, not sure about the others just search for HOST api.
Flickrs api is at:
http://www.flickr.com/services/api/

What design considerations should one take to receive text and multiple attachments via web?

I am developing a web application to accept a bunch of text and attachments (1 or more) via email, web and other methods.
I am planning to build a single interface, mostly a web service to accept this content.
What design considerations should I make?
I am building the app using ASP.NET MVC 2.
Should the attachments be saved to disk or in the database?
Should the unified single interface be a web service?
Pros and cons to using web services to upload files
as with any acceptance of files i'd be checking them for viruses or the like. i'm very nervous about files transmitted from the internet.
i always like putting my files in a database because it's neater i find. i hate having files over the network with folders needing rights etc. i know there are people that prefer it the other way so i guess my answer is also depends on personal preference.
i like the db approach because i can more easily tie files to records and do searches. if you have a file system then you still need to store info about the file plus the extra work of storing it.
then if you need to move files around you also need to possibly modify references in the database.
then again, you need to allocate enough space to grow the database and then cater for multiple databases perhaps as storage runs out.
so i guess if you're downloading large files then yeah maybe i can see the point of a file system as it's easier to grow it. if you have small text files then maybe a database will work.

How do image hosting sites enforce content policies?

I'm trying to figure out how to best implement a public data hosting service.
How do websites that let users upload pictures enforce their terms of service regarding obscene pictures? Do they use image processing algorithms to flag potential violations (too many skin-colored pixels)? I think Imageshack looks at the websites that their pictures are hotlinked on, and checks for keywords. If it detects anything porn related, then it removes the picture and bans the account. Are there other methods?
Is enforcement largely automated or is it based more on user reports?
I suppose it depends on the scale of your "public data hosting service".
If it's something small with maybe a couple hundreds pictures per day flowing in, you can moderate them on your own.
If it's a couple hundred thousands you'll need an amount of human beings sorting the weeds out. It's either a moderator team or users themselves who submit abuse reports.
Which one to go, can be dependent on your budget/financial success of your service as well as on the type of the service. If it's something simple like Rapidshare where one does not see what the other does, the chances that users will see each others content and through this notice and hopefully report unacceptable content are small. If it's something very social like Flickr you can bet on it reports will be flowing in.
I suppose you could automate something but it's almost an impossible task. You can't automatically detect porn. You can't automatically detect images violating copyrights - making footprints of copyrighting material in order to compare them with the uploaded stuff is a real challenge for companies with resources like Rapidshare, Youtube and others. For now this kind of work can effectively be done only by humans.
There are also legal issues to it. In some countries the service owner is not liable for what users contribute (well, if he's cooperative enough to delete certain content at request), in others he will get the charges himself for not having premoderated all the incoming content. Also think of this with regard to whatever and wherever you are going to launch.
I don't have links, but while it's certainly a difficult task prone to errors, software to detect improper content does exist. Or at least that's what the Security Manager at NASA told me - if if was just a means to scare me I don't know ;-)

Resources