I'm building a website that stores a lot of community-generated multimedia.
I originally wanted to store the data in my own SAN, but for scalability purposes I'm looking for a solution to store all the images outside my network and request them by HTTP request connectors.
I also want a solution that will let me grow in space easily by clustering or by other way.
I guess that there is a term for what I'm looking for, but I'm really not sure what it is.
I'm not sure if this question belongs to SO or ServerFault. I'm asking from the perspective of a web developer, but maybe it qualifies best for a networking question. My apologies if I'm wrong.
Best to all and wish you a happy new year.
Well, there have been a lot of services offering just that, the most popular being Amazon's S3 ( used for example at 37Signal ). A lot of libraries exist for almost any language being used in webdev, so maybe thats something to get you started!
Here are a few services you could use:
Amazon S3 / Amazon CloudFront
Rackspace CloudFiles
OpSource Cloud Files
Box.net
The terms you are looking for is "cloud storage" and optionally "content delivery network" or CDN for short. For example on Amazon, S3 is the cloud storage and CloudFront the CDN.
Related
I was hoping to get a recommendation on the best way to store a database encryption key for HIPAA compliance as well as Amazon S3 file storage security. I have been searching stackoverflow and googling in general, but I just can't quite get a solid grasp whether what I'm specifically doing is sufficient. I don't want something I'm doing differently from prescribed methods to make my app insecure.
Currently, I have a Rails app that uses the gem attr_encrypted to encrypt sensitive patient identifying data in the database like name, ssn, address etc. I also store things like images of signatures and patient pictures in Amazon S3 uses server side encryption. I know I shouldn't hardcode the database encryption key in the application or in any file that might get verion controlled, but can I keep it in heroku's env config variables? How are those secured? How separate are they from the database (as in, if someone gets into heroku and steals a copy of the database, are the ENV variables vulnerable somehow as well?)? I currently keep my AWS keys in heroku env variables, is that safe? Also, what is the best pass phrase to use for the encryption? I am currently using 2 sentences from a random page in a book I have.
Please let me know if I'm being terribly naive with any of the procedures I've outlined, and I apologize in advance if I am asking naive questions. I'd like to be HIPAA compliant, but in addition I'd like piece of mind that I've gone beyond what HIPAA requires since from what I understand, HIPAA compliance does not always = actually secure.
Thanks everyone!
(This is more of a comment, but it was too long to be added as a comment):
#Eli: HIPAA doesn't actually mandate any specific technology. Which is good because it's a law and not as mutable as shifting technology.
#OP: Here is a whitepaper on building HIPAA-complaint apps on AWS. It should give you some good ideas. But Eli is correct in that you'll need to contact Heroku for their compliance information. Or you might just be better off migrating off of Heroku at this point. In my experience, it's a good prototyping platform, but it's easy (and expensive) to start bumping into it's limitations when dealing with production environments.
#FrederickCheung: Reading directly from /dev/random will block if there isn't enough entropy. It's generally recommended to use /dev/urandom or an actual crypto library if pseudo-random isn't good enough.
We are building an image and file hosting website and we will save these files on our servers, so I want to know if there are any best practices or standards I need to read and follow to make our website scalable and easy to extend in the future.
Is there a book or articles or videos talking about this subject, please share.
As per my experience to deal with large data.
its always best to opt for Cloud, check for "Amazon S3" (Amazon AWS) or Windows Azure.
features like "CDN" (cloud front) is a big plus.
I believe this is not a simple question that can be answered without knowing
how many files are expected ?
how many users/files accesses per hour/day/minute ?
your usage scenarios with this files (downloading? streaming? how many concurrent files downloaded at once?
are you stuck in one particular OS (windows) and filesystem (NTFS), or is there freedom in this ?
My personal note : Building own image/file hosting is not a trivial task, i strongly recommend you to hire somebody with experience from this area.
I would recommend that if possible, you look at a 3rd party solution that provides an api. you'll then get the benefits of lower cost of ownership, no maintenance costs for the hardware and continual updates thrown in for free when the 3rd party adds new features to the core offering. I know this from 1st hand experience as we scoped out the options for doing this in a recent project and came to the conclusion that we'd spend 100 times more on our own solution and even then, may not get it right. We opted for a company called Razuna who offer both a hosted and open source version of their platform. Their api is very straightfwd and can be consumed inside your mvc app with potentially only a few days effort (depending on your use case). The beauty of this approach is that the hosted elements are actually on the nirvanix backbone and are served via their CDN - so win win.
You can get the details at:
http://www.razuna.com
and can view the api docs at:
http://wiki.razuna.com/display/ecp/Developer+Guides
Good luck and if you need any further real-life guidence on this, feel free to come back. Oh and btw, we were also able to ask for 'paid for' features to be added to the core offering at pretty much standard market day rates.
How can i write a cloud-aware application? e.g. an application that takes benefit of being deployed on cloud. Is it same as an application that runs or a vps/dedicated server? if not then what are the differences? are there any design changes? What are the procedures that i need to take if i am to migrate an application to cloud-aware?
Also i am about to implement a web application idea which would need features like security, performance, caching, and more importantly free. I have been comparing some frameworks and found that django has least RAM/CPU usage and works great in prefork+threaded mode, but i have also read that django based sites stop to respond with huge load of connections. Other frameworks that i have seen/know are Zend, CakePHP, Lithium/Cake3, CodeIgnitor, Symfony, Ruby on Rails....
So i would leave this to your opinion as well, suggest me a good free framework based on my needs.
Finally thanks for reading the essay ;)
I feel a matrix moment coming on... "what is the cloud? The cloud is all around us, a prison for your program..." (what? the FAQ said bring your sense of humour...)
Ok so seriously, what is the cloud? It depends on the implementation but usual features include scalable computing resource and a charge per cpu-hour, storage area etc. So yes, it is a bit like developing on your VPS/a normal server.
As I understand it, Google App Engine allows you to consume as much as you want. The back-end resource management is done by Google and billed to you and you pay for what you use. I believe there's even a free threshold.
Amazon EC2 exposes an API that actually allows you to add virtual machine instances (someone correct me please if I'm wrong) having pre-configured them, deploy another instance of your web app, talk between private IP ranges if you wish (slicehost definitely allow this). As such, EC2 can allow you to act like a giant load balancer on the front-end passing work off to a whole number of VMs on the back end, or expose all that publicly, take your pick. I'm not sure on the exact detail because I didn't build the system but that's how I understand it.
I have a feeling (but I know least about Azure) that on Azure, resource management is done automatically, for you, by Microsoft, based on what your app uses.
So, in summary, the cloud is different things depending on which particular cloud you choose. EC2 seems to expose an API for managing resource, GAE and Azure appear to be environments which grow and shrink in the background based on your use.
Note: I am aware there are certain constraints developing in GAE, particularly with Java. In a minute, I'll edit in another thread where someone made an excellent comment on one of my posts to this effect.
Edit as promised, see this thread: Cloud Agnostic Architecture?
As for a choice of framework, it really doesn't matter as far as I'm concerned. If you are planning on deploying to one of these platforms you might want to check framework/language availability. I personally have just started Django and love it, having learnt python a while ago, so, in my totally unbiased opinion, use Django. Other developers will probably recommend other things, based on their preferences. What do you know? What are you most comfortable with? What do you like the most? I'd go with that. I chose Django purely because I'm not such a big fan of PHP, I like Python and I was comfortable with the framework when I initially played around with it.
Edit: So how do you write cloud-aware code? You design your software in such a way it fits on one of these architectures. Again, see the cloud-agnostic thread for some really good discussion on ways of doing this. For example, you might talk to some services on GAE which scale. That they are on GAE (example) doesn't really matter, you use loose coupling ideas. In essence, this is just a step up from the web service idea.
Also, another feature of the cloud I forgot to mention is the idea of CDN's being provided for you - some cloud implementations might move your data around the globe to make it more efficient to serve, or just because that's where they've got space. If that's an issue, don't use the cloud.
I cannot answer your question - I'm not experienced in such projects - but I can tell you one thing... both CakePHP and CodeIgniter are designed for PHP4 - in other words: for really old technology. And it seems nothing is going to change in their case. Symfony (especially 2.0 version which is still in heavy beta) is worth considering, but as I said on the very beginning - I can not support this with my own experience.
For designing applications for deployment for the cloud, the main thing to consider if recoverability. If your server is terminated, you may lose all of your data. If you're deploying on Amazon, I'd recommend putting all data that you need persisted onto an Elastic Block Storage (EBS) device. This would be data like user generated content/files, the database files and logs. I also use the EBS snapshot on a 5 day rotation so that's backed up itself. That said, I've had a cloud server up on AWS for over a year without any issues.
As for frameworks, I'm giving Grails a try at the minute and I'm quite enjoying it. Built to be syntactically similar to Rails but runs on the JVM. It means you can take advantage of all the Java goodness, like threading, concurrency and all the great libraries out there to build your web application.
I have a rails app that accepts file uploads and I wanted to know the best way to have common storage between servers. Since we have a number of windows applications we have used samba in the past, but as we build pure linux apps I would like to do this the best possible way.
We are expecting large amounts of data, so would need to scale this across multiple file servers.
I've used paperclip with an S3 backend.
If you want to have all the data in-house than a networked file-system might be the way to go. Try setting up AFS it scales pretty good.
Another good alternative is from the creators of Memcached:
Mogile FS
http://www.danga.com/mogilefs/
One easy way to do it is to use attachment_fu with an S3 backend.
I just watched the Windows Azure intro video and it left me feeling like it was a front end shell for hosted IIS instances. Can anyone who know more (possibily that was part of the beta) shed on why you would use this vs. EC2.
it seemed easy enough but really didnt give specifics on how it works, why it works or why you would use this vs the traditional solutions out there?
According to the vision (and I can only talk about the vision here since the product isn't really out yet), here's a couple of reasons you might consider Azure over EC2.
Azure includes built-in load balancing abilities. If you want to do that in Amazon, you have to roll your own solution or buy a third-party solution like www.RightScale.com.
Azure-friendly-coded apps can be delivered internally or in Microsoft's cloud. If you write apps that have confidential information like financial data or health care data, not all of your clients will be willing to put their data in the public cloud. In that case, they can deploy your apps internally on Windows. That's sold as a skillset win, because you can go from public to private projects. Don't get me wrong - if you master Amazon EC2 development, then you can deploy your apps internally with Linux virtual servers in your datacenter, but it's not as turnkey. (Hard to describe a tech preview as turnkey when it's not licensed yet, hahaha.)
Having said that, it wasn't clear that the load balancing functionality is included in the box with internal deployments. If you have to do a combination of Azure plus ISA Server, that'll be a tougher deployment and management sell.
AppHarbor is a .NET cloud hosting environment that sits on Amazon EC2. The nice thing is they offer a free plan (much like Heroku does) so you can check it out yourself with very little friction.
My company is using Amazon EC2 now and I am down at the PDC watching the details on Azure unfold. I have not seen anything yet that would convince us to move away from Amazon. Azure definitely looks compelling, but the fact is I can now utilize Windows and SQL server on Amazon with SLAs in place. Ray Ozzie made it clear that Azure will be changing A LOT based on feedback from the developer community. However, Azure has a lot of potential and we'll be watching it closely.
Also, Amazon will be adding load balancing, autoscaling and dashboard features in upcoming updates to the service (see this link: http://aws.amazon.com/contact-us/new-features-for-amazon-ec2/). Never underestimate Amazon as they have a good headstart on Cloud Computing and a big user base helping refine their offerings already. Never underestimate Microsoft either as they have a massive developer community and global reach.
Overall I do not think the cloud services of one company are mutually exclusive from one another. The great thing is that we can leverage all of them if we want to.
Microsoft should offer up the ability to host Linux based servers in their cloud. That would really turn the world upside down!
Well it's more than just web services. It will also allow you to host other types of connected applications. Plus it provides integrated access to other MS software on the cloud; i.e. SharePoint, Exchange, CRM, SQL data sevices, and will allow you to fully customize and extend those offerings in the same way that you would be able to customize and extend them if they were hosted on-premises.
At the Archtect Insight Conference last year they mentioned that they have started to alter core server products to deal with the large scale failover environment which is very interesting to me at least.
Its bunch of stuff that is coming into the Cloud. I think of this as more of Platform in the Cloud.
Sql Server
CRM
MOSS
Exchange
BizTalk
Geneva (identity)
The terms that are mentioned here are "STORE" and "COMPUTE"
For me this get really intersting around the IDEA of a Internet Service Bus.
It is also about moving to the development workflow process too.
OSLO DSLs and Qudrant - Moving to a Model Driven View
Entity Framework - giving developers strong typed model in code at a click of button
ADO Data Services and Data Dynamic Webtemplates using MVC
Then with the Azure Templates and the new "WebRoles" moving to deployment of the applications to the Cloud.
Then for the Admins one click provisioning of servers is awsome.
On the Data Privacy Rules... which is the one big elephant in the room and has been mentioned... Typically there is the often a ruling in each Country about information security.
UK RIPA
US Patriot Act
Are these really conceptully different? And these 2 countries do share information anyway...IMHO (legally they are different, but to a customer both laws give access to customer data its just question of who)
At this point, information on Windows Azure is pretty scarce. I was in the keynote during the announcement, and my best guess at this point is that they're trying to provide a more extensive virtualization environment than simply hosted IIS instances.
At this point, though, I can't say more than that.
We use S3 for storage very successfully and I've always kept an eye on EC2 for Windows and SQL Server support. So now these are available I dug further.
I was pretty worried when I read this:
http://www.brentozar.com/archive/2008/11/bad-storage-performance-on-amazon-ec2-windows-servers/
Perhaps, as we're developing what will hopefully become a very popular website, we should be considering the new data store models - Azure's or Amazon's SimpleDB. Hmmmmm - complete rewrite!
The major difference going forward is that Amazon EC2 is free from today Nov 1, Check this out.
http://www.buzzingup.com/2010/10/amazon-announces-free-cloud-services-for-new-developers/