Other than the monetary aspects, how different is Amazon's SimpleDB from Apache's CouchDB in the following terms
Interfacing with programming languages like Java, C++ etc
Performance and Scalability
Installation and maintenance
I'm a fairly heavy SimpleDB user (I'm the developer of http://www.backupsdb.com/) but am currently migrating some projects off SimpleDB and into Couch, so I guess I can see this from both sides now.
1. Interfacing with programming languages like Java, C++ etc
Easier with Couch as you can talk to it very easily using JSON. SimpleDB is a bit more work, largely due to the complexities of signing each request for security and the lower level access you get which requires you to implement exponential back off in the case of busy signals etc. You can get good libraries for SimpleDB though in many languages now and this takes the pain away in many respects.
2. Performance and Scalability
I don't have any benchmarks, but for my own use case, CouchDB outperforms SimpleDB. It's harder to scale though - SimpleDB is great at that, you chuck more at it and it autoscales around you.
There are lots of occasionally irritating limits in SimpleDB though, limits on the number of attributes, size of attributes, number of domains etc. The main annoyance for many applications is the attribute size limit which means you can't store large forum posts for example. The workaround is to offload those into something else such as S3, but it's a bit annoying at times. Obviously CouchDB doesn't have that issue and indeed the fact that you can attach large files to documents is one thing that particularly attracts me to it.
Scaling wise, you should also possibly be looking at bigcouch which gives you a distributed cluster and is closer to what you get with SDB.
3. Installation and Maintenance
I actually found it much easier with CouchDB. I suspect it depends on which library you need to use for SimpleDB, but when I was starting with it, the Amazon supplied libraries weren't very mature and the open source community ones had various issues that meant getting up and running and doing something serious with it took more time than I would have liked. I suspect this is much better now.
CouchDB was surprisingly easy to install and I love the web interface to it. Indeed that would be my major criticism of SimpleDB - Amazon still don't have any form of web console for it despite having web consoles for almost every other service. That's why we wrote the very basic BackupSDB just so we could extract data in XML and run queries from a web browser, I'd like to have seen Amazon do something similar (but more powerful and better) by now and have been very surprised that they haven't. There are lots of third party firefox plugins and some applications for it though but I have the impression that SimpleDB isn't that widely used - this is only a hunch really.
4. Other Observations
The biggest issue I think is that with SimpleDB you are entrusting all your data to a third party with no easy way of getting it out (you'll need to write something to do that), plus your costs keep gently rising. When you get to the point that the cost is comparable to a powerful dedicated database server, you kind of feel you'd get better value that way, but the migration headache is non trivial by this point as you'll have a large commitment to the cloud.
I started off as a huge Amazon evangelist, and for most things I still am, but when it comes to SDB, I feel it's a bit of a hobby project for Amazon the way the Apple TV was for Steve jobs.
Related
We've got an Excel spreadsheet floating around right now (globally) at my company to capture various pieces of information about each countries technology usage. The problem is that it goes out, gets changes, but they're never obvious, and often conflicting - and then we have to smash them together. To me, the workbook is no more than a garbage in/garbage out type application waiting to be written.
In a company that has enough staff and knowledge to dedicate to Enterprise projects, for some reason, agile and language/frameworks such as Rails, Grails, etc. are frowned upon. That said, I can't help but think that this is almost a perfect fit for the need, given the scaffolding features for extremely simple implementations of capturing raw fields with only a couple lookups (i.e. a pre-defined category). I'm thinking this would be considered a very appropriate use of these frameworks.
Has anyone worked on these types of quick and dirty apps before in normally large-scale, heavy-handed enterprise environments with success? Any tips for communicating this need/appropriateness to non-technical management?
The only way to get this implemented in a rigid organization is to get this working and demo it -- without approval. It's very hard for management to say no to a finished project.
I work for a really big company & have written many utility apps based on Rails (as well as contributed to some larger Rails projects). That said, the biggest concern is not the quality of the app, but who's going to support/maintain it when you leave or get hit by the bus.
IMHO, The major fear that an enterprise organization has - especially if the application becomes more critical to it's core business - is how to support it. If it doesn't fit into it's neat little box of supported technologies, it's less likely to happen.
Corporations have been bitten by this many times in the past & are cautious when bringing in new technology.
So, if you can drum up more folks to learn Ruby/Rails in your group (or elsewhere in your company), you may be able to make a good case for it. Otherwise, sad to say, your probably better off implementing something on Sharepoint :-(.
If you already have a Java infrastructure, then creating a Grails app will require little to no additional IT ramp up to support and maintain. The support and maintenance cost and effort should be the same as for a Java application (i.e. Grails apps run on Tomcat, use the same JVM, use the same diagnostic/profiling tools, etc.).
In my experience, larger IT organizations have a harder time supporting Ruby when its not already in the toolchain because its a new language, new deployment environment, and requires a considerable amount of support and maintenance ramp up.
I would develop a minimal viable product, then make friends with someone in IT who can help you deploy it into a staging or production environment. Then get a few of the users to hop on board and test it like its a Beta product. After that, open it up to a larger audience.
So as others have said, forgiveness over permission, but be smart about the impact on the IT organization.
----edit-----
QUICK QUESTION: Does Grails take too much resources for high traffic website, and is it expensier to host?
For example: if I can make a site that has millions of users/m easier in CakePHP does it worth to make it in Grails just to save some webserver resources- or will it need more servers?
---------------
Hello,
I know there are a lot of similar questions on the net, but because I am a newbie in web development I didn't find the solution for my specific problem.
I am planing on creating a flash games portal from scratch. It is a big chance that there will be big traffic from the beginning (millions of pageviews). I want to reduce the server costs as much as possible but in the same time to not be tide to an expensive contract as there is a chance that the project will not be as successfully as I want and in that case the money would be very little.
The question is : what technology to use? I don't know any web dev technology yet so it doesn't matter what I will learn. My web dev experience is a little php 8 years ago, and from then I programmed in C++ / Java- game and mobile development. I like Java and C syntax and language very much and I tend to dislike dynamic typing or non robust scripting (like php)- but I can get along if these are the best
choices.
The candidates are now: -
Grails (my best for now)
Ruby on Rails
Cake PHP
Other technologies (Google App
Engine, Python/Django etc...)
I was considering at first using pure C and compiling the web app in the server- just to squeeze more from the servers, but soon I understand that this is overkill.
Next my eyes came on Ruby - as there is a lot of buzz for it's easiness of use. Next I discovered Grails and looked at Java because it is said that it is "faster". But I don't know what this "Faster" really means on my needs, so here comes the first question:
1) What will be my biggest consumption on the server, other than bandwidth, for a lot of flash content requests? Is it memory? I heard that Java needs a lot of memory, but is faster. Is it CPU? I am planning to take some daily VPS.NET nodes at first, to see if there is a demand, and if the "spike" is permanent to move to a dedicated server (serverloft.com has some good offers), else to remain with less nodes.
I was also considering developing in Google App Engine- cheap or free hosting to use at first - so I can test my assumption- and also very easy to use (no need for sys administration) but the costs became high if used more (> 3 million games played / month .. x mb/ each). And the issue with Google is that it looks me in this technology.
My other concern is scalability (not only for traffic/users, but as adding functionality) My plans are to release a functional site in just 4 weeks (just the basics frontend and some quick basic backend - so I can be able to modify some things and add games manually) - but then to raise it and add more things to it. I am planning to take a little different approach than other portals so I need to write it from scratch (a script will not do).
2) Will Grails take much more resources than RoR or Php server wise? I heard that making it on Java stack will be hardware expensive and is overkill if you don't make a bank application. My application will not be very complex (I hope and i will try to) but will have a lot of traffic.
I also took in account using CDN for files, but the cheapest CDN found was 5c/GB (vps.net) and the cost per gb on serverloft (http://www.serverloft.com/dedizierte-server/server-details.php?products=4) is only 1.79 cents/GB and comes with the other resources either.
I am new to this domain (web). I am learning the ropes and searching on the web for ~half of year but don't have any really practical experience, so I know that I must have some naive thinking and other issues that i don't know from now, so please give me any advice you want regarding anything, not just the specific questions asked.
And thank you so much for such great community!
This is how I (on my blog) view web performance, especially for highly abstracted frameworks like Grails.
I don't understand the obsession with
runtime performance. Given most
project scenarios your primary focus
should be on your performance, as in
your ability to get things done with a
chosen technology.
For example, you will get more done in
a given period of time with Groovy
than with Java any day. Often one line
of Groovy code will equate to 10 lines
of Java code etc etc
Very rarely will byte code execution
time be your performance issue, most
often its...
Bad algorithm implementation or
design. Bad DB design and / or queries.
Taking to long to get things done and
then having all sorts of commercial
relationship issues because of it.
With web applications you are usually
not performing lots of long running
CPU bound operations. Most of your
request / response time is spent in
the wire (internet routing etc) and in
the DB (executing queries).
Choose a technology that takes a load
off your mind and one that frees you
from writing mountains of boiler plate
code, so that you can rather
concentrate on designing and
implementing good algorithms, DB's and
queries etc etc
How can i write a cloud-aware application? e.g. an application that takes benefit of being deployed on cloud. Is it same as an application that runs or a vps/dedicated server? if not then what are the differences? are there any design changes? What are the procedures that i need to take if i am to migrate an application to cloud-aware?
Also i am about to implement a web application idea which would need features like security, performance, caching, and more importantly free. I have been comparing some frameworks and found that django has least RAM/CPU usage and works great in prefork+threaded mode, but i have also read that django based sites stop to respond with huge load of connections. Other frameworks that i have seen/know are Zend, CakePHP, Lithium/Cake3, CodeIgnitor, Symfony, Ruby on Rails....
So i would leave this to your opinion as well, suggest me a good free framework based on my needs.
Finally thanks for reading the essay ;)
I feel a matrix moment coming on... "what is the cloud? The cloud is all around us, a prison for your program..." (what? the FAQ said bring your sense of humour...)
Ok so seriously, what is the cloud? It depends on the implementation but usual features include scalable computing resource and a charge per cpu-hour, storage area etc. So yes, it is a bit like developing on your VPS/a normal server.
As I understand it, Google App Engine allows you to consume as much as you want. The back-end resource management is done by Google and billed to you and you pay for what you use. I believe there's even a free threshold.
Amazon EC2 exposes an API that actually allows you to add virtual machine instances (someone correct me please if I'm wrong) having pre-configured them, deploy another instance of your web app, talk between private IP ranges if you wish (slicehost definitely allow this). As such, EC2 can allow you to act like a giant load balancer on the front-end passing work off to a whole number of VMs on the back end, or expose all that publicly, take your pick. I'm not sure on the exact detail because I didn't build the system but that's how I understand it.
I have a feeling (but I know least about Azure) that on Azure, resource management is done automatically, for you, by Microsoft, based on what your app uses.
So, in summary, the cloud is different things depending on which particular cloud you choose. EC2 seems to expose an API for managing resource, GAE and Azure appear to be environments which grow and shrink in the background based on your use.
Note: I am aware there are certain constraints developing in GAE, particularly with Java. In a minute, I'll edit in another thread where someone made an excellent comment on one of my posts to this effect.
Edit as promised, see this thread: Cloud Agnostic Architecture?
As for a choice of framework, it really doesn't matter as far as I'm concerned. If you are planning on deploying to one of these platforms you might want to check framework/language availability. I personally have just started Django and love it, having learnt python a while ago, so, in my totally unbiased opinion, use Django. Other developers will probably recommend other things, based on their preferences. What do you know? What are you most comfortable with? What do you like the most? I'd go with that. I chose Django purely because I'm not such a big fan of PHP, I like Python and I was comfortable with the framework when I initially played around with it.
Edit: So how do you write cloud-aware code? You design your software in such a way it fits on one of these architectures. Again, see the cloud-agnostic thread for some really good discussion on ways of doing this. For example, you might talk to some services on GAE which scale. That they are on GAE (example) doesn't really matter, you use loose coupling ideas. In essence, this is just a step up from the web service idea.
Also, another feature of the cloud I forgot to mention is the idea of CDN's being provided for you - some cloud implementations might move your data around the globe to make it more efficient to serve, or just because that's where they've got space. If that's an issue, don't use the cloud.
I cannot answer your question - I'm not experienced in such projects - but I can tell you one thing... both CakePHP and CodeIgniter are designed for PHP4 - in other words: for really old technology. And it seems nothing is going to change in their case. Symfony (especially 2.0 version which is still in heavy beta) is worth considering, but as I said on the very beginning - I can not support this with my own experience.
For designing applications for deployment for the cloud, the main thing to consider if recoverability. If your server is terminated, you may lose all of your data. If you're deploying on Amazon, I'd recommend putting all data that you need persisted onto an Elastic Block Storage (EBS) device. This would be data like user generated content/files, the database files and logs. I also use the EBS snapshot on a 5 day rotation so that's backed up itself. That said, I've had a cloud server up on AWS for over a year without any issues.
As for frameworks, I'm giving Grails a try at the minute and I'm quite enjoying it. Built to be syntactically similar to Rails but runs on the JVM. It means you can take advantage of all the Java goodness, like threading, concurrency and all the great libraries out there to build your web application.
I need to deploy a Delphi app in an environment that needs centralized data and file storage system (for document imaging) but has multiple branch offices with relatively poor inter connectivity. I believe a 3 tier database application is the best way to go so I can provide a rich desktop experience with relatively light-weight data transfer needs. So far I have looked briefly at Delphi Datasnap, kbmMW and Remobjects SDK. It seems that kbmMW and Remobjects SDK use the least bandwidth. Does anyone have any experience in deploying any of these technologies in a challenging environments with a significant number of users (I need to support 700+)? Thanks!
Depends if you are tied to remote datasets. If you aren't dataset bound then SOAP would likely be a good choice. Or, what I've done is write my own protocol that is similar to SOAP in nature. This was done before SOAP was standard and I'm glad I did - this gives you the ability to control more of the flow of data. It's given that if you have poor connectivity then you will be spending time supporting it. It's very nice if it's your own code you are supporting versus having to wait on a vendor. (Although KBM and REM are known to be pretty good vendors.)
Personal note: 700 users in a document imaging application over poor connectivity sounds like a mess. Spend the money on upgrading connectivity as it'll be cheaper in the long run.
Both kbmMW and RO SDK offer binary format, which is more compact than SOAP format,specially you are working with documents.
RO sdk seems to offer more GUI tools to help you doing your services.
Also give a RealThinClient SDK a look, it's a lightweight remoting framework.
But what ever framework you go with, your design of work will make it fast or slow, I have some applications working on slow 128kb lines, and it's working perfect without any user complain, but I don't do a large transfer for files.
One thing to remember...its not the number of users, but the number of them using the resources at the same time that will be the issue. Attempt to develop your application "server stateless" if at all possible, this will allow greater flexibility in the long term if you find you have to add more servers to the pool to support your customer base. The hardest thing about n-tier is scaling beyond the first server...plan on that from the start. Each request should not know anything about a prior request...or at the very least the request should have a way of passing the context so the server can look it up in a session table or something.
Personally, I would recommend RemObjects. I have used it with good results.
I don't know if it's the very best / most efficient (glad you asked this question!), but I've had good results w/RemObjects SDK + DataAbstract. The latter made much of the plumbing details less involved, which was helpful. Still implementing, but so far so good.
If you really wanna go "low-bandwidth" use BSD Sockets API - that'll give you full control over what's being sent and there you can send as little information as you want. Of course then you'll have to implement all the tiers yourself, but hey - that's still an option :D
I have a memory of talking to people who have got so far in using Ruby on Rails and then had to abandon it when they have hit limits, or found it was ultimately too rigid. I forget the details but it may have had to do with using more than one database.
So what I'd like is to know is what features/requirements fall outside of Ruby on Rails, or at least requires such contortions that it is better to use another more flexible framework, even though you may have to lose some elegance or write extra boilerplate code.
Rails (not ruby itself) is proud to be "Opinionated Software".
What this means in practice is that the authors of rails have a certain target audience in mind (themselves basically) and aim rails specifically at that. If X feature isn't needed for that target audience, it doesn't get added.
Off the top of my head, things that rails explicitly doesn't support that people may care about:
Foreign keys in databases
Connections to multiple DB's at once
SOAP web services (since rails 2.0)
Connections to multiple database servers at once
That said, it is very easy to extend rails with plugins, and there are plugins which add all of the above functionality to rails, and a lot more, so I wouldn't really count these as limits.
The only other caveat is that rails is built around the idea of creating CRUD web applications using MVC. If you're trying to do something which is NOT a CRUD web app (like twitter, which is actually a messaging system, or if you are insane and want to use a model like ASP.NET webforms) then you will also encounter problems. In this case you're better off not using rails, as you're essentially trying to build a boat out of bicycle parts.
In all likelihood, the problems you will run into that can't just be fixed with a quick plugin or a day or 2 of coding are all inherent problems with the underlying C Ruby runtime (memory leaks, green threads, crap performance, etc).
Ruby on Rails does not support two-phase commits out of the box, which maybe required if your database-backed application needs to guarantee immediate consistency AND you need to use two or more database schemas.
For many web applications, I would venture that this is not a common use-case. One can perfectly well support eventual consistency with two or more databases. Or one could support immediate consistency with one database schema. The former case is a great problem to have if your app has to support a mondo amount of transactions (note the technical term :). The latter case is more typical, and Rails does just fine.
Frankly, I wouldn't worry about limits to using Ruby on Rails (or any framework) until you hit real scalability problems. Build a killer app first, and then worry about scalability.
CLARIFICATION: I'm thinking of things that Rails would have a hard-time supporting because it might require a fundamental shift in its architecture. I'll be generous and include some things that are part of the gem/plugin ecosystem such as foreign key enforcement or SOAP services.
By two-phase commits, I mean attempting to make two commits to physically distinct servers within one transactional context.
Use case #1 for a two-phase commit: you've clustered your database, so that you have 2 or more database servers and your schema is spread across both servers. You may want to commit to both servers, because you want to allow ActiveRecord think do a "foreign key map" that traverses across the different servers.
Use case #2 for a two-phase commit: you're attempting to implement a messaging solution (sorry, I'm J2EE developer by day). The message producer commits to the messaging broker (one server) and to the database (a different server).
Also found some good discussion about the limits of ActiveRecord.
I think there is a greater “meta-question” here, that could be answered and that is “when is it OK to lean on external libraries to speed up development time?”
Third party libraries are often great and can drastically reduce development time, however there is a major problem, Joel Spolsky calls this “the law of leaky abstractions.” If you look that up on Google his post will come up first. Essentially this means that the trade off in development time means that you have no idea what is going on under the covers. So when something breaks you are completely stuck and have very limited methods of debugging. This also means that if you hit one of the features that are simply unsupported in RAILS, that you really need, you’ll have no next step except to write the feature yourself, if you’re lucky. Many libraries can make this difficult to do.
We’ve been burned badly in my dev shop by this issue. Our solutions worked fine under normal load, but we found that the third party subscription libraries that we were using simply could not stand up to the kind of load that we experienced once our site started to get a large number of concurrent users. This puts us in a very difficult place; essentially we have to rewrite the entire subscription service ourselves, with performance in mind. Doing this means that we’ve wasted all the time that we spent using the library.
Third party libraries can be great for small to medium sized applications; they can drastically reduce development time and hide complexities that aren’t necessary to deal with in the early stages of development. However eventually they will catch up with you and you’ll likely have to rewrite or re-engineer your solution to get past the “law of leaky absctractions”
Ruby don't have a functionality like IsPostBack in ASP.Net
Orion's answer is right on. There are few hard limits to AR/Rails: deploying to Windows, AR connectors that aren't frequently used, e.g. Firebird, ), but even the things he mentioned, multiple databases and DB servers, there are gems and plugins that address those for legacy, sharding, and other reasons.
The real limitation is how time-consuming it is to keep on top of all the things that rails devs are working on, and researching specific issues, given how many blogs, and how much mailing list volume there are.