We are currently using postgresql for our production database in rails, great database, but I am building the new version of our application around SQLite. Indeed, we don't use advanced functions of postgres like full text search or PL/SQL. Considering SQLite, I love the idea to move the database playing with just one file, its simple integration in a server and in Rails, and the performance seems really good -> Benchmark
Our application's traffic is relatively high, we got something like 1 200 000 views/day. So, we make a lot of read from the database, but we make a few writes.
What do you think of that ? Feedback from anyone using or trying (like us) to use SQLite like a production database ?
If you do lots of reads and few writes then combine SQLite it with some sort of in-memory cache mechanism (memcache or redis are really good for this). This would help to minimize the number of accesses (reads) to the database. This approach helps on any many-reads-few-writes environment and it helps to not hit SQLite deficiencies - in your specific case.
SQLite is designed for embedded systems. It will work fine with a single user, but doesn't handle concurrent requests very well. 1.2M views per days probably means you'll get plenty of the latter.
For doing only reads I think in theory it can be faster than an out-of-process database server because you do not have to serialize data to memory or network streams, its all accessed in-process. In practice its possible an RDBMS could be faster; for example MySQL has pretty good query caching features and for certain queries that could be an improvement because all your rails process would use this same cache. With sqllite they would not share a cache.
Related
I am building a website with Asp.Net MVC 4 and C#, hosted in VPS. The site has lots of sharing comments/replies features (I need to store a lot of comments). I was planning to use MongoDB as its free and also very suitable for storing blogs/comments, but got a little hesitant after I read this article.
So now I am thinking of using MongoDb for storing comments and replies and SQL Server Express version with 10GB limitation for storing user accounts and other user profile data.
Is it okay to use 2 different databases like this in one web app? Or is there any other stable documents database similar to MongoDb that I can use and don't have to use SQL server ?
Is RavenDb an option ?
Thanks in advance !
Yes, it is okay to use two different databases in one web app. Even more - it's very good to do so, as every type of database has it cons and pros and they are never fit to do every job. You've probably chosen MongoDB because you've heard it's quite good for storing lots of data and heavy quering. This is mostly true but there are some caveats. On the other hand, SQL Server can be much slower (depends on exact workflow), and can be easily overloaded with heavy writes.
First of all - it does work well. Mostly. It tries as hard as it can, when on heavy load. But it's not as write efficient with parallel writes (because of heavy use of locking) as some others and it's not "very-safe" if you have to keep some collections or documents synchronised - e.g. SQL transaction, that does lots of stuff in different tables, as mongo does not give you anything more than document-level transactions (every change on single document is atomic).
There are other stable documents database, e.g. look at CouchDB - it's more write-oriented and has some funny possibilities thanks to Multi-Version Concurrency Control.
I'm planning on creating an app (Rails) that will have a very large collection of users - it'll start small but I would like it to be able to handle a million or more.
I want to build a system that will be able to handle 2500+ requests per second. Each request will require a write (for logging purposes) as well as a read from the enormous list of users, indexed by username (I was recommended to use MongoDB for this purpose) and the results of the read will be sent back to the user.
I am a little unclear about how mongo will handle both reads and writes, so I had this idea of using Mongo to sort of permanently store the records and then load them up into Redis every time the server starts up for even faster access so that Mongo doesn't have to deal with anything but the writes.
Does that sound reasonable or is that a huge misuse of Mongo and Redis?
The speed of delivery is of utmost importance.
It's possible, actually, to create the entire application using just Redis. What you'd want to do is research design patterns for Redis. A good place to start is this PDF by Karl Seguin called The Little Redis book.
For example, use Redis's hashes to save all users' information.
Further, if planned well you don't need to have another persistent storage such as Mongo or MySQL in conjunction with Redis as Redis is persistent itself. You just need to pick a good sharding/replication strategy that'll allow you to be flexible enough for future systemic changes.
I think the stack that you are asking about is certainly a very good solution and one that's pretty battle tested for high performance sites. Trello (created by same people who created this very site) uses a similar architecture as well as craigslist.
Trello Tech Stack Writeup
Craigslist also uses this
Redis is fast and has a great pub/sub mechanism in addition to normal invalidation type features that makes it a superior cache to most. Mongo is a db i'm very familiar with and think it's great for all sorts of data store purposes as well as being a solid enterprise db that scales well, protects data integrity and checks off a bunch of marks in the SLA enterprise jargon checklist
I think it's a great combination but really the question should be is do I even need this. For your load I think Mongo itself could handle this quite nicely (and give data integrity) and also if you really want you can run it on server with enough memory to make sure your dataset fits inside memory (denormalizing and good schema design is key). Foursquare runs exclusively on Mongo in memory.
So think if this is necessary but remember simple always wins. Redis/Mongo is super powerful but it will also take a lot more work to master two data stores and administer them.
Thanks,
Prasith
As others have mentioned, using a single service makes more sense to me. There's reason to keep the logging data in memory though. I'd try using something simple, a logfile if possible, or Scribe or Flume if you need to distribute the writes.
Heroku advises against this because of possible issues. I'm an SQL noob, can you explain the type of issues that could be encountered by using different databases?
I used sqlite3 in development and postgres in production for a while, but recently switched to postgres everywhere.
Things to note if you use both:
There are differences between sqlite3 and postgres that will bite you. A common thing I ran into is that postgres is stricter about types in queries (where :string_column => <integer> will work fine in sqlite and break in postgres). You definitely want a staging area that uses postgres if your dev is sqlite and it matters if your production app goes down because of a sql error.
Sqlite is much easier to set up on your local machine, and it's great being able to just delete/move .sqlite files around in your db/ directory.
taps allows you to mirror your heroku postgres data into your local sqlite db. It gets much slower as the database gets larger, and at a few 10s of tables and 100K+ rows it starts to take 20+ minutes to restore.
You won't get postgres features like ilike, the new key/value stores, fulltext search
Because you have to use only widely supported SQL features, it may be easier to migrate your app to mysql
So why did I switch? I wanted some postgres-only features, kept hitting bugs that weren't caught by testing, and needed to be able to mirror my production db faster (pg_restore takes ~1 minute vs 20+ for taps). My advice is to stay with sqlite in dev because of the simplicity, and then switch when/if you need to down the road. Switching from sqlite to postgres for development is as simple as setting up postgres - there's no added complexity from waiting.
Different databases interpret and adhere to the SQL standard differently. If you were to, say, copy paste some code from SQLite to PostgreSQL there's a very large chance that it won't immediately work. If it's only basic queries, then maybe, but when dealing with anything particular there's a very low chance of complete compatability.
Some databases are also more up to date with the standard. It's a similar battlefield to that of internet browsers. If you've ever made some websites you'd know compatability is a pain in the ass, having to get it to work for older versions and Internet Explorer. Because some databases are older than others, and some even older than the standards, they would've had their own way of doing things which they can't just scrap and jump to the standard because they would lose support for their existing larger customers (this is especially the case with a database engine called Oracle). PostgreSQL is sort of like Google Chrome, quite high up there on standards compliance but still with some of its own little quirks. SQLite is, as the name suggests, a light-weight database system. You could assume it lacks some of the more advanced functionality from the standards.
The database engines also perform the same actions differently. It is worth getting to know and understand one database and how it works (deeper than just the query level) so you can make the most of that.
I was in a (kind of) similar situation. Generally it is a very bad idea to use different database engines for production and test. There are multiple reasons
SQL syntax differences including DML, DDL statements, stored procedures, triggers etc
Performance optimizations done on one DB wont be valid on the other
SQLite is an embedded database, PostgreSQL is not
They don't support the same data types
Different syntax/commands to configure/setup db. SQLite uses PRAGMAs
One should stick to one db engine, unless you have a really, really good reason. I can't think of any.
I am looking for a backend solution for an application written in Ruby on Rails or Merb to handle data with several billions of records. I have a feeling that I'm supposed to go with a distributed model and at the moment I looked at
HBase with Hadoop
Couchdb
Problems with HBase solution as I see it -- ruby support is not very strong, and Couchdb did not reach 1.0 version yet.
Do you have suggestion what would you use for such a big amount of data?
Data will require rather fast imports sometimes of 30-40Mb at once, but imports will come in chunks. So ~95% of the time data will be read only.
Depending on your actual data usage, MySQL or Postgres should be able to handle a couple of billion records on the right hardware. If you have a particular high volume of requests, both of these databases can be replicated across multiple servers (and read replication is quite easy to setup (compared to multiple master/write replication).
The big advantage of using a RDBMS with Rails or Merb is you gain access to all of the excellent tool support for accessing these types of databases.
My advice is to actually profile your data in a couple of these systems and take it from there.
There's a number of different solutions people have used. In my experience it really depends more on your usage patterns related to that data and not the sheer number of rows per table.
For example, "How many inserts/updates per second are occurring." Questions like these will play into your decision of what back-end database solution you'll choose.
Take Google for example: There didn't really exist a storage/search solution that satisfied their needs, so they created their own based on a Map/Reduce model.
A word of warning about HBase and other projects of that nature (don't know anything about CouchDB -- I think it's not really a db at all, just a key-value store):
Hbase is not tuned for speed; it's tuned for scalability. If response speed is at all an issue, run some proofs of concept before you commit to this path.
Hbase does not support joins. If you are using ActiveRecord and have more than one relation.. well you can see where this is going.
The Hive project, also built on top of Hadoop, does support joins; so does Pig (but it's not really sql). Point 1 applies to both. They are meant for heavy data processing tasks, not the type of processing you are likely to be doing with Rails.
If you want scalability for a web app, basically the only strategy that works is partitioning your data and doing as much as possible to ensure the partitions are isolated (don't need to talk to each other). This is a little tricky with Rails, as it assumes by default that there is one central database. There may have been improvements on that front since I looked at the issue about a year and a half ago. If you can partition your data, you can scale horizontally fairly wide. A single MySQL machine can deal with a few million rows (PostgreSQL can probably scale to a larger number of rows but might work a little slower).
Another strategy that works is having a master-slave set up, where all writes are done by the master, and reads are shared among the slaves (and possibly the master). Obviously this has to be done fairly carefully! Assuming a high read/write ratio, this can scale pretty well.
If your organization has deep pockets, check out what Vertica, AsterData, and Greenplum have to offer.
The backend will depend on the data and how the data will be accessed.
But for the ORM, I'd most likely use DataMapper and write a custom DataObjects adapter to get to whatever backend you choose.
I'm not sure what CouchDB not being at 1.0 has to do with it. I'd recommend doing some testing with it (just generate a billion random documents) and see if it'll hold up. I'd say it will, despite not having a specific version number.
CouchDB will help you a lot when it comes to partitioning/sharding your data and like, seems like it might fit with your project -- especially if your data format might change in the future (adding or removing fields) since CouchDB databases have no schema.
There are plenty of optimizations in CouchDB for read-heavy apps as well and, based on my experience with it, is where it really shines.
Is there drop-in replacement for ActiveRecord that uses some sort of Object Store?
I am thinking something like Erlang's MNesia would be ideal.
Update
I've been investigating CouchDB and I think this is the option I am going to go with. It's a toss-up between using CouchRest and ActiveCouch. CouchRest is pretty mature, and is used in the CouchDB peepcode episode, but it's not a drop-in replacement for ActiveRecord, which is a bit of a disadvantage.
Suffice to say CouchDB is pretty phenomenal.
Update (November 10, 2009)
CouchDB hasn't really worked for me. CouchDB doesn't really support arbitrary queries (queries need to be written and compiled ahead of time). It also breaks on very large datasets.
I have been playing with MongoDB and it's really incredible. Schema-less JSON data store with queries and indexing.
I've even started building a management tool for it called Ming.
Try Maglev!
AciveCouch purports to be just such a library for CouchDB, which is, in fact, written in Erlang. I wouldn't say it's as mature as ActiveRecord though.
That is the closest thing I know of to what you're asking for.
Madeleine is an implementation of the Java Prevayler object store
see http://madeleine.rubyforge.org/
I'm currently working on a ruby object database that uses mysql as a backing store (hence it's called hybriddb) that you may be interested in.
It uses no SQL or migrations, you just save your objects to the database, it also tries to work around the conventional problems with object databases (speed, finding objects quickly, large object graphs) transparently.
It is still an early version so take care. The code is here
http://github.com/pauliephonic/hybriddb/tree/master The development branch has support for transactions and I'm currently adding basic validations.
I have a web site with some tutorials etc. http://www.hybriddb.org/pages/tutorial_starter
Any comments are welcome there.
Apart from Madeleine, you can also see:
http://purple.rubyforge.org/
But it depends on scale too. Mnesia is known to support large amount of data, and is clustered, whereas these solutions won't work so well with large amount of data.
If amount of data is not huge, another options is:
http://copiousfreetime.rubyforge.org/amalgalite/files/README.html