Switching from SQl to MongoDB in Rails 3 - ruby-on-rails

I am considering switching a quite big application (Rails 3.0.10) from our SQL database (SQLite and Postgres) to MongoDB. I plan to put everything in it, mainly utf-8 string, binary file and user data. (Maybe also a little full text search) I have complex relationships (web structure: categories, tags, translations..., polymorphic also) and I feel that MongoDB philosophy is to avoid that and to put everything in big document, am I right ?
Does anyone have experience with MongoDB in Rails ? Particularly switching a app from ActiveRecord to Mongoid ? Do you think it's a good idea ? Do you know a guide/article to learn the MongoDB way to organize complex data ?
ps : In MongoDB, I particularly like the freedom offers by its architecture and its performance orientation. It's my main personal motivations to consider the switch.

I am using mongodb with mongoid, for 5-6 months. Have also worked with postgres + AR, MySQL + AR. Have no experience with switching AR to mongoid.
Are you facing any performance issues or expect to face them soon? If not I would advice to avoid the switch, as the decision seems just to be based on coolness factor of Mongodb.
They both have their pros and cons, I like the speed of mongodb, but there are many restrictions on what you can do to achieve that(like no joins, no transaction support and slow field vs. field(updated_at > created_at) queries).
If there are performance issues, I would still recommend to stick with your current system, as the switch might be a big task and it would be better if you spend half the time in optimizing the current system. After reading the question, I get a feeling that you have never worked with mongodb before, there are a many things which can bite you and you would not be fully aware of how to solve them.
However, If you still insist on switching, you need to carefully evaluate you data structure and the way you query them. In relational database, you have the normal forms, which have the advantage that whatever structure you start with, you will roughly reach the same end result once you do the normalization. In mongodb, there are virtually unlimited ways in which you can model your documents. You need to carefully model your documents to avail the benefits of mongodb. The queries you need to run play a very important role in your structuring along with the actual data you want to store.
Keep in mind, you do not have joins in mongodb(can be mitigated, with good modeling). As of now you can not have queries like, field1 = field2, i.e. you can't compare fields, but need to provide a literal to query against.
Take a look at this question: Efficient way to store data in MongoDB: embedded documents vs individual documents. Somebody points the OP to a discussion where embedded documents are recommended, but pretty much similar scenario, OP chooses to go with standalone documents, because of the nature of the queries he will be using to fetch the data.
All I want to say is, it should be a informed decision, which should be taken after you completely model your system with mongodb, have some performance tests with some real data to see if mongodb will solve your problem and should not be based on coolness factor.
UPDATE:
You can do field1 = field2 using $where clause, but its slow and is advised to be avoided.

We are currently switching from PostgreSQL, tsearch, and PostGIS in a production application. It has been a challenging process to say the least. Our data model is a better fit for mongodb because we don't need to do complex joins. We can model our data very easily into the nested document structure mongodb provides.
We have started a mirror site with the mongodb changes in it so we can leave the production site alone, while we stumble through the process. I don't want to scare you, because in the end, we will be happy we made the switch - but it is a lot of work. I would agree with the answer from rubish: be informed, and make the decision you feel is best. Don't base it on the 'coolness' factor.
If you must change, here are some tips from our experience:
ElasticSearch fits well with mongo's document structure to replace PostgreSQL's tsearch full text search extensions.
It also has great support for point based geo indexing. (Points of interest closest to, or within x miles/kilometers)
We are using Mongo's built in GridFS to store files, which works great. It simplifies the sharing of user contributed images, and files across our cluster of servers.
We are using rake tasks to dump data out of postgresql into yaml format. Then, we have another rake task in the mirror site which imports and converts the data into models stored in mongodb.
The data export/import might work using a shared redis database, resque on both sides, and an observer in the production application to log changes as they happen.
We are using Mongoid as our ODM, and there are a lot of scopes within our models that needed to be rewritten to work with Mongoid vs ActiveRecord.
Over all, we are very happy with MongoDB. It offers us much more flexibility in the way we model our data. I just wish we would have discovered it before the project were started.

skip active record,
Alternatively, if you’ve already created your app, have a look at config/application.rb
and change the first lines from this:
require "rails/all"
to this:
require "action_controller/railtie"
require "action_mailer/railtie"
require "active_resource/railtie"
require "rails/test_unit/railtie"
It’s also important to make sure that the reference to active_record in the generator block is commented out:
Configure generators values. Many other options are available, be sure to check the documentation.
# config.generators do |g|
# g.orm :active_record
# g.template_engine :erb
# g.test_framework :test_unit, :fixture => true
# end
As of this this writing, it’s commented out by default, so you probably won’t have to change anything here.
I hope it will be helpful to you while you switching app from AR to mongo.
Thanks.

Related

ActiveRecord and NoSQL

I've been working with Rails for a few years and am very used to ActiveRecord, but have recently landed a task that would benefit from (some) NoSQL data storage.
A small amount of data would be best placed in a NoSQL system, but the bulk would still be in an RDBMS. Every NoSQL wrapper/gem I've looked at, though, seems to necessitate the removal of ActiveRecord from the application.
Is there a suggested method of combining the two technologies?
Not sure what NoSQL service you are looking into, but we have used MongoDB in concert with Postgres for a while now. Helpful hint, they say you need to get rid of ActiveRecord, but in reality, you don't. Most just say that because you end up not setting up your database.yml and/or running rake commands to setup AR DB.
Remember also that Postgres has HStore and JSON datatypes which give similar functionality as NoSQL datastores. Also, if the data you are looking to store outside of your AR DB is not very complex, I would highly recommend looking into Redis.
If you look at the Gemfile.lock of this project, you can see that it uses ActiveRecord with Mongoid.
Even if you use other gems that don't need ActiveRecord, you shouldn't care. If you are using it, you should have a valid reason to do so.

Multi-schema Postgres on Heroku

I'm extending an existing Rails app, and I have to add multi-tenant support to it. I've done some reading, and seeing how this app is going to be hosted on Heroku, I thought I could take advantage of Postgres' multi-schema functionality.
I've read that there seems to be some performance issues with backups when multiple schemas are in use. This information I felt was a bit outdated. Does anyone know if this is still the case?
Also, are there any other performance issues, or caveats I should take into consideration?
I've already thought about adding a field to every table so I can use a single schema, and have that field reference to the tenants table, but given the time windows multiple schemas seem the best solution.
I use postgres schemas for a multi-tenancy site based on some work by Ryan Bigg and the Apartment gem.
https://leanpub.com/multi-tenancy-rails
https://github.com/influitive/apartment
I find that having seperate schemas for each client an elegant solution which provides a higher degree of data segregation. Personally I find the performance improves because Postgres can simply return all results from a table without have to filter to an 'owner_id'.
I also think it makes for simpler migrations and allows you to adjust individual customer data without making global changes. For example you can add columns to specific customers schemas and use feature flags to enable custom features.
My main argument relating to performance would be that backup is a periodic process, whereas customer table scoping would be on every access. On that basis, I would take any performance hit on backup over slowing down the customer experience.

Mongoid: Is there a utility to "sync" the DB's fields with the current "schema" as defined in my Models?

Sorry if the question is awkwardly phrased -- still a Mongo/Mongoid/Rails newbie here.
What I'm trying to ask:
In my development, I've been changing around the design of my Models as I go, adding some fields here, removing some fields there (one of the great things about MongoDB/Mongoid is that you can do this very quickly and easily!)
Everything is working, but in browsing through the development database, I've got some "detritus" -- documents with the old fields (and data) that aren't being used. It's no big deal other than to my garbage-collective sensibilities. I could, in theory, drop the DB and start from scratch, but that's messy.
Is there a utility / gem / etc. that will, essentially, look at the current document design and drop any fields in the live DB that don't match up to the data model?
I know this can be done manually, and I know about the mongoid migrations gems that are out there -- those are both good and, ultimately, more thorough solutions (which I'll look at).
For now, though, I'm wondering if there's a simple "quick shot" type of utility to simply sync up the DB and drop any fields that aren't explicitly specified in my models.
Thanks!
I don't know of any tools that do this for you. It's not a common ask because most people see this flexibility as a useful feature.
For your development database, you should probably clear it out and start over.
In production you have a couple choices:
Write your code to be robust against fields being missing and the database documents not matching your mongoid model. In other words, be prepared for the two to get out of sync.
Get in the habit of migrating your data every time you change the model. It's a bit of hassle and not strictly necessary if you follow the first, but if the untidiness bothers you, this is a fine idea. See Managing mongoid migrations for strategies.

The Ruby community values simplicity...what's your argument for simplifying a db schema in a new project?

I'm working on a project with developers who have not worked with Ruby OR Rails before.
They have created a schema that is too complicated, in my opinion. The schema has 117 tables, and obtaining the simplest piece of information would require traversing/joining 7 tabels...and of course, there's no "main" table that serves as a sort of key between them. The schema renders many of the rails tools like 'find' method, and many of the has_many/belongs to relationships almost useless. And coding for all of these relationships will likely be more time-consuming than we have the money to code for.
THE QUESTION:
Assuming you are VERY convinced (IMHO...hehe) that the schema is not ideal, and there are multiple ways to represent the domain, how would you argue FOR simplifying the schema (aside from what I've already said)?
I'll stand up in 2 roles here
DBA: Database admin/designer.
Dev: Application developer.
I assume the DBA is a person who really know all the Database tricks. Reaallyy Knows.
DBA:
Database is the key of the application and should have predefined structure in order to serve its purpose well and with best performance.
If you cannot use random schema (which is reasonably normalised and good) then the tools are wrong.
Dev:
The database is just a data store, so we need to keep it simple and concentrate on the application.
DBA:
Database is not a store it is the core of the application. There is no application without database.
Dev:
No. The application is the core. There is no application without the front-end and the business logic applied to it.
And the war begins...
Both points are valid and it is always trade off.
If the database will ONLY be used by RoR, then you can use it more like a simple store.
If the DB can be used by other application OR it will be used with large amount of data and high traffic it must enforce some best practices.
Generally there is no way you can disagree with DBA.
But they can understand your situation and might allow you to loose the standards a bit so you could be more productive.
So you need to work closely, together.
And you need to talk to each other to explain and prove the point why database should be like this or that.
Otherwise, the team is broken and project can be failure with hight probability.
ActiveRecord is a very handy tool. But it cannot do everything for you. It does not provide Database structure by default that you expect exactly. So it should be tuned.
On the other side. If DBA can accept that all PKs are Auto incremented integers that would make Developer's life easier (ActiveRecord does it by default).
On the other side, if developers would accept some of DBA constraints it would make DBA's life easier.
Now to answer your question:
how would you argue FOR simplifying the schema
Do not argue. Meet the team and deliver the message and point on WHY it should be done.
Maybe it really shouldn't and you don't know all the things, maybe they are not aware of something.
You could agree on the general structure of the database AND try to describe it using RoR migrations as a meta language.
This way they would see the general picture, and you would use your great ActiveRecords.
And also everybody would be on the same page.
Your DB schema should reflect the domain and its relationships.
De-normalisation should only be done when you have measured that there is a performance problem.
7 joins is not excessive or bad, provided you have good indexes in place.
The general way to make this argument up the chain is based on cost. If you do things simply, there will be less code and fewer bugs. The system will be able to be built more quickly, or with more features, and thus will create more ROI. If you can get the money manager on board with that approach, he or she may let you dictate terms to the team. There is the counterargument that extreme over-normalization prevents bad data, but I have found that this is not the case, as the complexity it engenders tends to lead to more errors and more database code in general.
The architectural and technical argument here is simple. You have decided to use Ruby on Rails. Therefore you have decided to use the ActiveRecord pattern. The ActiveRecord pattern is driven by having the database tables match the object model. That's the pattern in use here, and in many other places, so the best practices they are trying to apply for extreme data normalization simply do not apply. Buy a copy of Patterns of Enterprise Application Architecture and put the little red bookmark at page 160 so they can understand how the pattern works from the architecture perspective.
What the DBA types tend to be unaware of is how much work ActiveRecord does for you, from query generation, cascading deletes, optimistic locking, auto populated columns, versioning (with acts_as_versioned), soft deletes (with acts_as_paranoid), etc. There is a strong argument to use well tested, community supported library functions to perform these operations versus custom code that must be maintained by a DBA.
The real issue with DBAs is then that they need some work to do. Let them focus on monitoring performance, finding slow queries in the code, creating indexes and doing backups.
If you end up losing the political battle for a sane schema, you may want to consider switching to DataMapper. It's the next pattern in PoEAA. The other thing you may be able to get them to do is to create views in the database that correspond to the object model. This way, you could use many of the finding capabilities in the ActiveRecord model based on the views, but have custom insert, update, and delete methods.

Object database for Ruby on Rails

Is there drop-in replacement for ActiveRecord that uses some sort of Object Store?
I am thinking something like Erlang's MNesia would be ideal.
Update
I've been investigating CouchDB and I think this is the option I am going to go with. It's a toss-up between using CouchRest and ActiveCouch. CouchRest is pretty mature, and is used in the CouchDB peepcode episode, but it's not a drop-in replacement for ActiveRecord, which is a bit of a disadvantage.
Suffice to say CouchDB is pretty phenomenal.
Update (November 10, 2009)
CouchDB hasn't really worked for me. CouchDB doesn't really support arbitrary queries (queries need to be written and compiled ahead of time). It also breaks on very large datasets.
I have been playing with MongoDB and it's really incredible. Schema-less JSON data store with queries and indexing.
I've even started building a management tool for it called Ming.
Try Maglev!
AciveCouch purports to be just such a library for CouchDB, which is, in fact, written in Erlang. I wouldn't say it's as mature as ActiveRecord though.
That is the closest thing I know of to what you're asking for.
Madeleine is an implementation of the Java Prevayler object store
see http://madeleine.rubyforge.org/
I'm currently working on a ruby object database that uses mysql as a backing store (hence it's called hybriddb) that you may be interested in.
It uses no SQL or migrations, you just save your objects to the database, it also tries to work around the conventional problems with object databases (speed, finding objects quickly, large object graphs) transparently.
It is still an early version so take care. The code is here
http://github.com/pauliephonic/hybriddb/tree/master The development branch has support for transactions and I'm currently adding basic validations.
I have a web site with some tutorials etc. http://www.hybriddb.org/pages/tutorial_starter
Any comments are welcome there.
Apart from Madeleine, you can also see:
http://purple.rubyforge.org/
But it depends on scale too. Mnesia is known to support large amount of data, and is clustered, whereas these solutions won't work so well with large amount of data.
If amount of data is not huge, another options is:
http://copiousfreetime.rubyforge.org/amalgalite/files/README.html

Resources