Image/File hosting storage best practices and standards - asp.net-mvc

We are building an image and file hosting website and we will save these files on our servers, so I want to know if there are any best practices or standards I need to read and follow to make our website scalable and easy to extend in the future.
Is there a book or articles or videos talking about this subject, please share.

As per my experience to deal with large data.
its always best to opt for Cloud, check for "Amazon S3" (Amazon AWS) or Windows Azure.
features like "CDN" (cloud front) is a big plus.

I believe this is not a simple question that can be answered without knowing
how many files are expected ?
how many users/files accesses per hour/day/minute ?
your usage scenarios with this files (downloading? streaming? how many concurrent files downloaded at once?
are you stuck in one particular OS (windows) and filesystem (NTFS), or is there freedom in this ?
My personal note : Building own image/file hosting is not a trivial task, i strongly recommend you to hire somebody with experience from this area.

I would recommend that if possible, you look at a 3rd party solution that provides an api. you'll then get the benefits of lower cost of ownership, no maintenance costs for the hardware and continual updates thrown in for free when the 3rd party adds new features to the core offering. I know this from 1st hand experience as we scoped out the options for doing this in a recent project and came to the conclusion that we'd spend 100 times more on our own solution and even then, may not get it right. We opted for a company called Razuna who offer both a hosted and open source version of their platform. Their api is very straightfwd and can be consumed inside your mvc app with potentially only a few days effort (depending on your use case). The beauty of this approach is that the hosted elements are actually on the nirvanix backbone and are served via their CDN - so win win.
You can get the details at:
http://www.razuna.com
and can view the api docs at:
http://wiki.razuna.com/display/ecp/Developer+Guides
Good luck and if you need any further real-life guidence on this, feel free to come back. Oh and btw, we were also able to ask for 'paid for' features to be added to the core offering at pretty much standard market day rates.

Related

CMS, or pre-baked solutions for community file sharing

I want to create a community around a current iPhone app I've built. It will allow registered users to upload and download small configuration or settings files, which are used in my app to customize functionality. These files are serialized plists (binary files around 500 bytes), but can be converted to a JSON or XML format if necessary.
I do not need an HTML front-end; I plan for it to be accessed only via my app. Files do not need to be private or secure. I do not plan to store or ask for any user private data--just a login and password.
I'm looking for tips that might get me close to my goals with the least amount of effort - I want to focus on the core functionality of the app, and have this as a stable feature that I can add to in the future if it is useful. I would of course prefer FOSS, but a commercial solution is not out of the question. Things like file sharing sites with apis, login ideas, and so on.
So, what software solutions are out there that I may not be aware of? I know that Drupal has modules to allow user logins. Is there something that would work not as a web app, but as a service only? Dropbox has file sharing and an API, but I'm not sure I could use it the way I'm intending.
In short, I could code this, but would prefer a pre-baked solution that would deal with things I may not have thought of. I am sure there must be something out there which I can use.
More Details, and what I plan on the service offering:
Registration of users via the iPhone, and all that entails (will code the UI myself--I just want an API to connect to)
Viewing of these files quickly and efficiently (the files were built with performance in mind, and this is a free app, so I would like to keep server costs down)
Uploading their own files, with a few integrity checks
Rating the files
Gathering statistics on usage (which files were downloaded most often), etc., to provide a way for the files to be ranked by rating, popularity, etc.
Optional - submitting revised versions of the files (a tree).
Optional but preferred - statistics on users (no. files uploaded, perhaps rewards system for sharing)
I'm just not up to date with current technologies and open source solutions. I have experience in SQL, relational database design, and have built backends in Java, so a custom solution is not out of the question. However, it's been a while, I'm not a security expert, and would prefer to not reinvent the wheel for what is a fairly simple project, so an off-the-shelf solution would be preferred.
Check out www.parse.com!
It is absolutely brilliant for stuff like this.
You may want to look at source versioning systems like SVN or distributed systems like Mercurial or GIT. Both would be much better if the data were serialized to a text format, like JSON or XML as you mentioned.
Registration would need to be done by you of course
Viewing of files (including changes, of course) is quick and efficient. The interface can be done in a number of ways, even simulating command-line.
Uploading files will of course work, and changes made will be stored as diffs. Integrity checks can be done, for example, by Mercurial plugins
Rating the files probably can't be done directly unless you wanted an awkward hack involving parsing change entries or writing a plugin.
Submitting revised versions of files would work as that is the raison d'ĂȘtre of versioning systems.
Some statistics are made available in VCSs.
This is honestly a bit of a strange use for version control systems and not altogether elegant, but sometimes that's what innovation is about.
I suggest TikiWiki .
Pros:
Out-of-the-box all you need to build a community. (See reference below for list of features)
It's FOSS
It has 200 active developers - so it really has a lot of momentum.
Cons:
So many out-of-the-box features that it suffers from feature bloat. Configuration and initial set-up may be complicated.
Not really oriented to mobile platforms.

What language/framework (technology) to use for website (flash games portal)

----edit-----
QUICK QUESTION: Does Grails take too much resources for high traffic website, and is it expensier to host?
For example: if I can make a site that has millions of users/m easier in CakePHP does it worth to make it in Grails just to save some webserver resources- or will it need more servers?
---------------
Hello,
I know there are a lot of similar questions on the net, but because I am a newbie in web development I didn't find the solution for my specific problem.
I am planing on creating a flash games portal from scratch. It is a big chance that there will be big traffic from the beginning (millions of pageviews). I want to reduce the server costs as much as possible but in the same time to not be tide to an expensive contract as there is a chance that the project will not be as successfully as I want and in that case the money would be very little.
The question is : what technology to use? I don't know any web dev technology yet so it doesn't matter what I will learn. My web dev experience is a little php 8 years ago, and from then I programmed in C++ / Java- game and mobile development. I like Java and C syntax and language very much and I tend to dislike dynamic typing or non robust scripting (like php)- but I can get along if these are the best
choices.
The candidates are now: -
Grails (my best for now)
Ruby on Rails
Cake PHP
Other technologies (Google App
Engine, Python/Django etc...)
I was considering at first using pure C and compiling the web app in the server- just to squeeze more from the servers, but soon I understand that this is overkill.
Next my eyes came on Ruby - as there is a lot of buzz for it's easiness of use. Next I discovered Grails and looked at Java because it is said that it is "faster". But I don't know what this "Faster" really means on my needs, so here comes the first question:
1) What will be my biggest consumption on the server, other than bandwidth, for a lot of flash content requests? Is it memory? I heard that Java needs a lot of memory, but is faster. Is it CPU? I am planning to take some daily VPS.NET nodes at first, to see if there is a demand, and if the "spike" is permanent to move to a dedicated server (serverloft.com has some good offers), else to remain with less nodes.
I was also considering developing in Google App Engine- cheap or free hosting to use at first - so I can test my assumption- and also very easy to use (no need for sys administration) but the costs became high if used more (> 3 million games played / month .. x mb/ each). And the issue with Google is that it looks me in this technology.
My other concern is scalability (not only for traffic/users, but as adding functionality) My plans are to release a functional site in just 4 weeks (just the basics frontend and some quick basic backend - so I can be able to modify some things and add games manually) - but then to raise it and add more things to it. I am planning to take a little different approach than other portals so I need to write it from scratch (a script will not do).
2) Will Grails take much more resources than RoR or Php server wise? I heard that making it on Java stack will be hardware expensive and is overkill if you don't make a bank application. My application will not be very complex (I hope and i will try to) but will have a lot of traffic.
I also took in account using CDN for files, but the cheapest CDN found was 5c/GB (vps.net) and the cost per gb on serverloft (http://www.serverloft.com/dedizierte-server/server-details.php?products=4) is only 1.79 cents/GB and comes with the other resources either.
I am new to this domain (web). I am learning the ropes and searching on the web for ~half of year but don't have any really practical experience, so I know that I must have some naive thinking and other issues that i don't know from now, so please give me any advice you want regarding anything, not just the specific questions asked.
And thank you so much for such great community!
This is how I (on my blog) view web performance, especially for highly abstracted frameworks like Grails.
I don't understand the obsession with
runtime performance. Given most
project scenarios your primary focus
should be on your performance, as in
your ability to get things done with a
chosen technology.
For example, you will get more done in
a given period of time with Groovy
than with Java any day. Often one line
of Groovy code will equate to 10 lines
of Java code etc etc
Very rarely will byte code execution
time be your performance issue, most
often its...
Bad algorithm implementation or
design. Bad DB design and / or queries.
Taking to long to get things done and
then having all sorts of commercial
relationship issues because of it.
With web applications you are usually
not performing lots of long running
CPU bound operations. Most of your
request / response time is spent in
the wire (internet routing etc) and in
the DB (executing queries).
Choose a technology that takes a load
off your mind and one that frees you
from writing mountains of boiler plate
code, so that you can rather
concentrate on designing and
implementing good algorithms, DB's and
queries etc etc

Cloud-aware programming and help choosing a good framework

How can i write a cloud-aware application? e.g. an application that takes benefit of being deployed on cloud. Is it same as an application that runs or a vps/dedicated server? if not then what are the differences? are there any design changes? What are the procedures that i need to take if i am to migrate an application to cloud-aware?
Also i am about to implement a web application idea which would need features like security, performance, caching, and more importantly free. I have been comparing some frameworks and found that django has least RAM/CPU usage and works great in prefork+threaded mode, but i have also read that django based sites stop to respond with huge load of connections. Other frameworks that i have seen/know are Zend, CakePHP, Lithium/Cake3, CodeIgnitor, Symfony, Ruby on Rails....
So i would leave this to your opinion as well, suggest me a good free framework based on my needs.
Finally thanks for reading the essay ;)
I feel a matrix moment coming on... "what is the cloud? The cloud is all around us, a prison for your program..." (what? the FAQ said bring your sense of humour...)
Ok so seriously, what is the cloud? It depends on the implementation but usual features include scalable computing resource and a charge per cpu-hour, storage area etc. So yes, it is a bit like developing on your VPS/a normal server.
As I understand it, Google App Engine allows you to consume as much as you want. The back-end resource management is done by Google and billed to you and you pay for what you use. I believe there's even a free threshold.
Amazon EC2 exposes an API that actually allows you to add virtual machine instances (someone correct me please if I'm wrong) having pre-configured them, deploy another instance of your web app, talk between private IP ranges if you wish (slicehost definitely allow this). As such, EC2 can allow you to act like a giant load balancer on the front-end passing work off to a whole number of VMs on the back end, or expose all that publicly, take your pick. I'm not sure on the exact detail because I didn't build the system but that's how I understand it.
I have a feeling (but I know least about Azure) that on Azure, resource management is done automatically, for you, by Microsoft, based on what your app uses.
So, in summary, the cloud is different things depending on which particular cloud you choose. EC2 seems to expose an API for managing resource, GAE and Azure appear to be environments which grow and shrink in the background based on your use.
Note: I am aware there are certain constraints developing in GAE, particularly with Java. In a minute, I'll edit in another thread where someone made an excellent comment on one of my posts to this effect.
Edit as promised, see this thread: Cloud Agnostic Architecture?
As for a choice of framework, it really doesn't matter as far as I'm concerned. If you are planning on deploying to one of these platforms you might want to check framework/language availability. I personally have just started Django and love it, having learnt python a while ago, so, in my totally unbiased opinion, use Django. Other developers will probably recommend other things, based on their preferences. What do you know? What are you most comfortable with? What do you like the most? I'd go with that. I chose Django purely because I'm not such a big fan of PHP, I like Python and I was comfortable with the framework when I initially played around with it.
Edit: So how do you write cloud-aware code? You design your software in such a way it fits on one of these architectures. Again, see the cloud-agnostic thread for some really good discussion on ways of doing this. For example, you might talk to some services on GAE which scale. That they are on GAE (example) doesn't really matter, you use loose coupling ideas. In essence, this is just a step up from the web service idea.
Also, another feature of the cloud I forgot to mention is the idea of CDN's being provided for you - some cloud implementations might move your data around the globe to make it more efficient to serve, or just because that's where they've got space. If that's an issue, don't use the cloud.
I cannot answer your question - I'm not experienced in such projects - but I can tell you one thing... both CakePHP and CodeIgniter are designed for PHP4 - in other words: for really old technology. And it seems nothing is going to change in their case. Symfony (especially 2.0 version which is still in heavy beta) is worth considering, but as I said on the very beginning - I can not support this with my own experience.
For designing applications for deployment for the cloud, the main thing to consider if recoverability. If your server is terminated, you may lose all of your data. If you're deploying on Amazon, I'd recommend putting all data that you need persisted onto an Elastic Block Storage (EBS) device. This would be data like user generated content/files, the database files and logs. I also use the EBS snapshot on a 5 day rotation so that's backed up itself. That said, I've had a cloud server up on AWS for over a year without any issues.
As for frameworks, I'm giving Grails a try at the minute and I'm quite enjoying it. Built to be syntactically similar to Rails but runs on the JVM. It means you can take advantage of all the Java goodness, like threading, concurrency and all the great libraries out there to build your web application.

Windows Azure for web developers vs Amazon EC2

I just watched the Windows Azure intro video and it left me feeling like it was a front end shell for hosted IIS instances. Can anyone who know more (possibily that was part of the beta) shed on why you would use this vs. EC2.
it seemed easy enough but really didnt give specifics on how it works, why it works or why you would use this vs the traditional solutions out there?
According to the vision (and I can only talk about the vision here since the product isn't really out yet), here's a couple of reasons you might consider Azure over EC2.
Azure includes built-in load balancing abilities. If you want to do that in Amazon, you have to roll your own solution or buy a third-party solution like www.RightScale.com.
Azure-friendly-coded apps can be delivered internally or in Microsoft's cloud. If you write apps that have confidential information like financial data or health care data, not all of your clients will be willing to put their data in the public cloud. In that case, they can deploy your apps internally on Windows. That's sold as a skillset win, because you can go from public to private projects. Don't get me wrong - if you master Amazon EC2 development, then you can deploy your apps internally with Linux virtual servers in your datacenter, but it's not as turnkey. (Hard to describe a tech preview as turnkey when it's not licensed yet, hahaha.)
Having said that, it wasn't clear that the load balancing functionality is included in the box with internal deployments. If you have to do a combination of Azure plus ISA Server, that'll be a tougher deployment and management sell.
AppHarbor is a .NET cloud hosting environment that sits on Amazon EC2. The nice thing is they offer a free plan (much like Heroku does) so you can check it out yourself with very little friction.
My company is using Amazon EC2 now and I am down at the PDC watching the details on Azure unfold. I have not seen anything yet that would convince us to move away from Amazon. Azure definitely looks compelling, but the fact is I can now utilize Windows and SQL server on Amazon with SLAs in place. Ray Ozzie made it clear that Azure will be changing A LOT based on feedback from the developer community. However, Azure has a lot of potential and we'll be watching it closely.
Also, Amazon will be adding load balancing, autoscaling and dashboard features in upcoming updates to the service (see this link: http://aws.amazon.com/contact-us/new-features-for-amazon-ec2/). Never underestimate Amazon as they have a good headstart on Cloud Computing and a big user base helping refine their offerings already. Never underestimate Microsoft either as they have a massive developer community and global reach.
Overall I do not think the cloud services of one company are mutually exclusive from one another. The great thing is that we can leverage all of them if we want to.
Microsoft should offer up the ability to host Linux based servers in their cloud. That would really turn the world upside down!
Well it's more than just web services. It will also allow you to host other types of connected applications. Plus it provides integrated access to other MS software on the cloud; i.e. SharePoint, Exchange, CRM, SQL data sevices, and will allow you to fully customize and extend those offerings in the same way that you would be able to customize and extend them if they were hosted on-premises.
At the Archtect Insight Conference last year they mentioned that they have started to alter core server products to deal with the large scale failover environment which is very interesting to me at least.
Its bunch of stuff that is coming into the Cloud. I think of this as more of Platform in the Cloud.
Sql Server
CRM
MOSS
Exchange
BizTalk
Geneva (identity)
The terms that are mentioned here are "STORE" and "COMPUTE"
For me this get really intersting around the IDEA of a Internet Service Bus.
It is also about moving to the development workflow process too.
OSLO DSLs and Qudrant - Moving to a Model Driven View
Entity Framework - giving developers strong typed model in code at a click of button
ADO Data Services and Data Dynamic Webtemplates using MVC
Then with the Azure Templates and the new "WebRoles" moving to deployment of the applications to the Cloud.
Then for the Admins one click provisioning of servers is awsome.
On the Data Privacy Rules... which is the one big elephant in the room and has been mentioned... Typically there is the often a ruling in each Country about information security.
UK RIPA
US Patriot Act
Are these really conceptully different? And these 2 countries do share information anyway...IMHO (legally they are different, but to a customer both laws give access to customer data its just question of who)
At this point, information on Windows Azure is pretty scarce. I was in the keynote during the announcement, and my best guess at this point is that they're trying to provide a more extensive virtualization environment than simply hosted IIS instances.
At this point, though, I can't say more than that.
We use S3 for storage very successfully and I've always kept an eye on EC2 for Windows and SQL Server support. So now these are available I dug further.
I was pretty worried when I read this:
http://www.brentozar.com/archive/2008/11/bad-storage-performance-on-amazon-ec2-windows-servers/
Perhaps, as we're developing what will hopefully become a very popular website, we should be considering the new data store models - Azure's or Amazon's SimpleDB. Hmmmmm - complete rewrite!
The major difference going forward is that Amazon EC2 is free from today Nov 1, Check this out.
http://www.buzzingup.com/2010/10/amazon-announces-free-cloud-services-for-new-developers/

What is the best Delphi n-tier low bandwidth technology?

I need to deploy a Delphi app in an environment that needs centralized data and file storage system (for document imaging) but has multiple branch offices with relatively poor inter connectivity. I believe a 3 tier database application is the best way to go so I can provide a rich desktop experience with relatively light-weight data transfer needs. So far I have looked briefly at Delphi Datasnap, kbmMW and Remobjects SDK. It seems that kbmMW and Remobjects SDK use the least bandwidth. Does anyone have any experience in deploying any of these technologies in a challenging environments with a significant number of users (I need to support 700+)? Thanks!
Depends if you are tied to remote datasets. If you aren't dataset bound then SOAP would likely be a good choice. Or, what I've done is write my own protocol that is similar to SOAP in nature. This was done before SOAP was standard and I'm glad I did - this gives you the ability to control more of the flow of data. It's given that if you have poor connectivity then you will be spending time supporting it. It's very nice if it's your own code you are supporting versus having to wait on a vendor. (Although KBM and REM are known to be pretty good vendors.)
Personal note: 700 users in a document imaging application over poor connectivity sounds like a mess. Spend the money on upgrading connectivity as it'll be cheaper in the long run.
Both kbmMW and RO SDK offer binary format, which is more compact than SOAP format,specially you are working with documents.
RO sdk seems to offer more GUI tools to help you doing your services.
Also give a RealThinClient SDK a look, it's a lightweight remoting framework.
But what ever framework you go with, your design of work will make it fast or slow, I have some applications working on slow 128kb lines, and it's working perfect without any user complain, but I don't do a large transfer for files.
One thing to remember...its not the number of users, but the number of them using the resources at the same time that will be the issue. Attempt to develop your application "server stateless" if at all possible, this will allow greater flexibility in the long term if you find you have to add more servers to the pool to support your customer base. The hardest thing about n-tier is scaling beyond the first server...plan on that from the start. Each request should not know anything about a prior request...or at the very least the request should have a way of passing the context so the server can look it up in a session table or something.
Personally, I would recommend RemObjects. I have used it with good results.
I don't know if it's the very best / most efficient (glad you asked this question!), but I've had good results w/RemObjects SDK + DataAbstract. The latter made much of the plumbing details less involved, which was helpful. Still implementing, but so far so good.
If you really wanna go "low-bandwidth" use BSD Sockets API - that'll give you full control over what's being sent and there you can send as little information as you want. Of course then you'll have to implement all the tiers yourself, but hey - that's still an option :D

Resources