Using both MongoDB and SQL Server Express with Asp.net MVC - asp.net-mvc

I am building a website with Asp.Net MVC 4 and C#, hosted in VPS. The site has lots of sharing comments/replies features (I need to store a lot of comments). I was planning to use MongoDB as its free and also very suitable for storing blogs/comments, but got a little hesitant after I read this article.
So now I am thinking of using MongoDb for storing comments and replies and SQL Server Express version with 10GB limitation for storing user accounts and other user profile data.
Is it okay to use 2 different databases like this in one web app? Or is there any other stable documents database similar to MongoDb that I can use and don't have to use SQL server ?
Is RavenDb an option ?
Thanks in advance !

Yes, it is okay to use two different databases in one web app. Even more - it's very good to do so, as every type of database has it cons and pros and they are never fit to do every job. You've probably chosen MongoDB because you've heard it's quite good for storing lots of data and heavy quering. This is mostly true but there are some caveats. On the other hand, SQL Server can be much slower (depends on exact workflow), and can be easily overloaded with heavy writes.
First of all - it does work well. Mostly. It tries as hard as it can, when on heavy load. But it's not as write efficient with parallel writes (because of heavy use of locking) as some others and it's not "very-safe" if you have to keep some collections or documents synchronised - e.g. SQL transaction, that does lots of stuff in different tables, as mongo does not give you anything more than document-level transactions (every change on single document is atomic).
There are other stable documents database, e.g. look at CouchDB - it's more write-oriented and has some funny possibilities thanks to Multi-Version Concurrency Control.

Related

Is using Redis right for this situation?

I'm planning on creating an app (Rails) that will have a very large collection of users - it'll start small but I would like it to be able to handle a million or more.
I want to build a system that will be able to handle 2500+ requests per second. Each request will require a write (for logging purposes) as well as a read from the enormous list of users, indexed by username (I was recommended to use MongoDB for this purpose) and the results of the read will be sent back to the user.
I am a little unclear about how mongo will handle both reads and writes, so I had this idea of using Mongo to sort of permanently store the records and then load them up into Redis every time the server starts up for even faster access so that Mongo doesn't have to deal with anything but the writes.
Does that sound reasonable or is that a huge misuse of Mongo and Redis?
The speed of delivery is of utmost importance.
It's possible, actually, to create the entire application using just Redis. What you'd want to do is research design patterns for Redis. A good place to start is this PDF by Karl Seguin called The Little Redis book.
For example, use Redis's hashes to save all users' information.
Further, if planned well you don't need to have another persistent storage such as Mongo or MySQL in conjunction with Redis as Redis is persistent itself. You just need to pick a good sharding/replication strategy that'll allow you to be flexible enough for future systemic changes.
I think the stack that you are asking about is certainly a very good solution and one that's pretty battle tested for high performance sites. Trello (created by same people who created this very site) uses a similar architecture as well as craigslist.
Trello Tech Stack Writeup
Craigslist also uses this
Redis is fast and has a great pub/sub mechanism in addition to normal invalidation type features that makes it a superior cache to most. Mongo is a db i'm very familiar with and think it's great for all sorts of data store purposes as well as being a solid enterprise db that scales well, protects data integrity and checks off a bunch of marks in the SLA enterprise jargon checklist
I think it's a great combination but really the question should be is do I even need this. For your load I think Mongo itself could handle this quite nicely (and give data integrity) and also if you really want you can run it on server with enough memory to make sure your dataset fits inside memory (denormalizing and good schema design is key). Foursquare runs exclusively on Mongo in memory.
So think if this is necessary but remember simple always wins. Redis/Mongo is super powerful but it will also take a lot more work to master two data stores and administer them.
Thanks,
Prasith
As others have mentioned, using a single service makes more sense to me. There's reason to keep the logging data in memory though. I'd try using something simple, a logfile if possible, or Scribe or Flume if you need to distribute the writes.

Using SQLite as production database, bad idea but

We are currently using postgresql for our production database in rails, great database, but I am building the new version of our application around SQLite. Indeed, we don't use advanced functions of postgres like full text search or PL/SQL. Considering SQLite, I love the idea to move the database playing with just one file, its simple integration in a server and in Rails, and the performance seems really good -> Benchmark
Our application's traffic is relatively high, we got something like 1 200 000 views/day. So, we make a lot of read from the database, but we make a few writes.
What do you think of that ? Feedback from anyone using or trying (like us) to use SQLite like a production database ?
If you do lots of reads and few writes then combine SQLite it with some sort of in-memory cache mechanism (memcache or redis are really good for this). This would help to minimize the number of accesses (reads) to the database. This approach helps on any many-reads-few-writes environment and it helps to not hit SQLite deficiencies - in your specific case.
SQLite is designed for embedded systems. It will work fine with a single user, but doesn't handle concurrent requests very well. 1.2M views per days probably means you'll get plenty of the latter.
For doing only reads I think in theory it can be faster than an out-of-process database server because you do not have to serialize data to memory or network streams, its all accessed in-process. In practice its possible an RDBMS could be faster; for example MySQL has pretty good query caching features and for certain queries that could be an improvement because all your rails process would use this same cache. With sqllite they would not share a cache.

Using different datastores in the Rails same app?

So this is more or less an implementation question, this is the senario I have, basically we have an app which uses MySQL as it's datastore, user accounts, transactions etc, but we want to add in a robust charting feature and the data will be stored in Redis, now basically my question is:
Is it possible, and what are the best practices for integrating another datastore into an app which already depends on another one. Can I use Rack to generate the reports? etc...
I want to turn this into a sort of open discussion because I think the need for a solution like this is going to rise as we see more and more key/value stores that offer benefits far different than a RDBMS, an NoSQL stores as well. They all have their benefits but no one solution covers them all.
Thoughts?
You can have models that do not inherit ActiveRecord::Base. Add your preferred Redis client gem, do whatever config is necessary, and start writing Redis models.
I can try to reopen this topic, because should be very practical.
Have same issue with this. I want to replicate data from SQL to NoSQL. SQL used as main database storage, because data integrity, relations etc. And NoSQL as secondary database storage set for reading. In SQL you have much associations divided to much tables. Many one-to-one association saved in different tables for better readability. This associations should be saved as one document with NoSQL. It gives unbelievable speed. Only one load. Great for data exchange for API.
Do someone positive experience with replication SQL data to more consistent NoSQL documents?

Ruby On Rails/Merb as a frontend for a billions of records app

I am looking for a backend solution for an application written in Ruby on Rails or Merb to handle data with several billions of records. I have a feeling that I'm supposed to go with a distributed model and at the moment I looked at
HBase with Hadoop
Couchdb
Problems with HBase solution as I see it -- ruby support is not very strong, and Couchdb did not reach 1.0 version yet.
Do you have suggestion what would you use for such a big amount of data?
Data will require rather fast imports sometimes of 30-40Mb at once, but imports will come in chunks. So ~95% of the time data will be read only.
Depending on your actual data usage, MySQL or Postgres should be able to handle a couple of billion records on the right hardware. If you have a particular high volume of requests, both of these databases can be replicated across multiple servers (and read replication is quite easy to setup (compared to multiple master/write replication).
The big advantage of using a RDBMS with Rails or Merb is you gain access to all of the excellent tool support for accessing these types of databases.
My advice is to actually profile your data in a couple of these systems and take it from there.
There's a number of different solutions people have used. In my experience it really depends more on your usage patterns related to that data and not the sheer number of rows per table.
For example, "How many inserts/updates per second are occurring." Questions like these will play into your decision of what back-end database solution you'll choose.
Take Google for example: There didn't really exist a storage/search solution that satisfied their needs, so they created their own based on a Map/Reduce model.
A word of warning about HBase and other projects of that nature (don't know anything about CouchDB -- I think it's not really a db at all, just a key-value store):
Hbase is not tuned for speed; it's tuned for scalability. If response speed is at all an issue, run some proofs of concept before you commit to this path.
Hbase does not support joins. If you are using ActiveRecord and have more than one relation.. well you can see where this is going.
The Hive project, also built on top of Hadoop, does support joins; so does Pig (but it's not really sql). Point 1 applies to both. They are meant for heavy data processing tasks, not the type of processing you are likely to be doing with Rails.
If you want scalability for a web app, basically the only strategy that works is partitioning your data and doing as much as possible to ensure the partitions are isolated (don't need to talk to each other). This is a little tricky with Rails, as it assumes by default that there is one central database. There may have been improvements on that front since I looked at the issue about a year and a half ago. If you can partition your data, you can scale horizontally fairly wide. A single MySQL machine can deal with a few million rows (PostgreSQL can probably scale to a larger number of rows but might work a little slower).
Another strategy that works is having a master-slave set up, where all writes are done by the master, and reads are shared among the slaves (and possibly the master). Obviously this has to be done fairly carefully! Assuming a high read/write ratio, this can scale pretty well.
If your organization has deep pockets, check out what Vertica, AsterData, and Greenplum have to offer.
The backend will depend on the data and how the data will be accessed.
But for the ORM, I'd most likely use DataMapper and write a custom DataObjects adapter to get to whatever backend you choose.
I'm not sure what CouchDB not being at 1.0 has to do with it. I'd recommend doing some testing with it (just generate a billion random documents) and see if it'll hold up. I'd say it will, despite not having a specific version number.
CouchDB will help you a lot when it comes to partitioning/sharding your data and like, seems like it might fit with your project -- especially if your data format might change in the future (adding or removing fields) since CouchDB databases have no schema.
There are plenty of optimizations in CouchDB for read-heavy apps as well and, based on my experience with it, is where it really shines.

Object database for Ruby on Rails

Is there drop-in replacement for ActiveRecord that uses some sort of Object Store?
I am thinking something like Erlang's MNesia would be ideal.
Update
I've been investigating CouchDB and I think this is the option I am going to go with. It's a toss-up between using CouchRest and ActiveCouch. CouchRest is pretty mature, and is used in the CouchDB peepcode episode, but it's not a drop-in replacement for ActiveRecord, which is a bit of a disadvantage.
Suffice to say CouchDB is pretty phenomenal.
Update (November 10, 2009)
CouchDB hasn't really worked for me. CouchDB doesn't really support arbitrary queries (queries need to be written and compiled ahead of time). It also breaks on very large datasets.
I have been playing with MongoDB and it's really incredible. Schema-less JSON data store with queries and indexing.
I've even started building a management tool for it called Ming.
Try Maglev!
AciveCouch purports to be just such a library for CouchDB, which is, in fact, written in Erlang. I wouldn't say it's as mature as ActiveRecord though.
That is the closest thing I know of to what you're asking for.
Madeleine is an implementation of the Java Prevayler object store
see http://madeleine.rubyforge.org/
I'm currently working on a ruby object database that uses mysql as a backing store (hence it's called hybriddb) that you may be interested in.
It uses no SQL or migrations, you just save your objects to the database, it also tries to work around the conventional problems with object databases (speed, finding objects quickly, large object graphs) transparently.
It is still an early version so take care. The code is here
http://github.com/pauliephonic/hybriddb/tree/master The development branch has support for transactions and I'm currently adding basic validations.
I have a web site with some tutorials etc. http://www.hybriddb.org/pages/tutorial_starter
Any comments are welcome there.
Apart from Madeleine, you can also see:
http://purple.rubyforge.org/
But it depends on scale too. Mnesia is known to support large amount of data, and is clustered, whereas these solutions won't work so well with large amount of data.
If amount of data is not huge, another options is:
http://copiousfreetime.rubyforge.org/amalgalite/files/README.html

Resources