Extensible Rails application that connects to several databases - ruby-on-rails

I am implementing a Rails application that needs to aggregate search results from N independent heterogeneous databases. An example use case would be:
User queries "xpto"
The query is submitted to all the databases registered on the system
The results are transformed and combined in a predefined format
User gets the results
The application needs to provide an extensibility point for the introduction of new databases in the system. A database here can be of different kinds - a flat file, SQL database, REST interface to an SQL database, etc.
If I was working in C#/Java, ignoring speed issues, I would define a plug-in management system where each host would have a plug-in that would know how to query and transform the results of the host. New hosts would be easily introduced by defining a new plug-in and configuring the host in the system.
I am new comer to rails and I am looking for either ideas, tools or design patterns that can help me to solve this problem.

My best guess wolud be to write a custom ActiveRecord Adapter that would query all your databases and combine the results.

From the API reference:
Connections are usually created through ActiveRecord::Base.establish_connection and retrieved by ActiveRecord::Base.connection. All classes inheriting from ActiveRecord::Base will use this connection. But you can also set a class-specific connection. For example, if Course is an ActiveRecord::Base, but resides in a different database, you can just say Course.establish_connection and Course and all of its subclasses will use this connection instead.

Related

MongoDB with PostgreSQL in One Rails App

Can I use MongoDB and a PostgreSQL in one rails app? Specifically I will eventually want to use something like MongoHQ. So far I have failed to get this working in experimentation. And it concerns me that the MongoDB documentation specifically says I have to disable ActiveRecord. Any advice would be appreciated.
You don't need to disable ActiveRecord to use MongoDB. Check out Mongoid and just add the gem plus any models along side any of your existing ActiveRecord models. You should note that MongoHQ is just a hosting service for MongoDB and can be used alongside any Object Document Mapper (ODM).
For further details check http://mongoid.org/en/mongoid/docs/installation.html. Just skip the optional 'Getting Rid of Active Record' step.
On a recent client site I worked with a production system that merged MySQL and MongoDB data with a single Java app. To be honest, it was a nightmare. To join data between the two databases required complex Java data structures and lots of code, which is actually databases do best.
One use-case for a two database system is to have the pure transactional data in the SQL database, and the aggregate the data into MongoDB for reporting etc. In fact this had been the original plan at the client, but along the way the databases became interrelated for transactional data.
The system has become so difficult to maintain that is is planned to be scrapped and replaced with a MongoDB-only solution (using Meteor.js).
Postgres has excellent support for JSON documents via it's jsonb datatype, and it is fully supported under Rails 4.2, out of the box. I have also worked with this and I find it a breeze, and I would recommend this approach.
This allows an easy mix of SQL and NoSQL transactions, eg
select id, blast_results::json#>'{"BlastOutput2","report","results","search","hits"}'
from blast_caches
where id in
(select primer_left_blast_cache_id
from primer3_output_pairs where id in (185423,185422,185421,185420,185419) )
It doesn't offer the full MongoDB data manipulation features, but probably is enough for most needs.
Some useful links here:
http://nandovieira.com/using-postgresql-and-jsonb-with-ruby-on-rails
https://dockyard.com/blog/2014/05/27/avoid-rails-when-generating-json-responses-with-postgresql
There are also reports that it can outperform MongoDB on json:
http://www.slideshare.net/EnterpriseDB/the-nosql-way-in-postgres
Another option would be to move your Rails app entirely to MongoDB, and Rails has very good support for MongoDB.
I would not recommend running two databases, based on personal observations on how it can go bad.

Writing to 2 or more different data layers

looking for a way in rails to allow me to write into 2 different data layers (databases) at the same time. first layer being the most important and should hold the request until finished, others can be processed in the background.
for example if i have a Person model and i create a new one, i want the entry to be save in MongoDB for example but later saves to MySQL, cassandra and so on.
any ideas and links are welcome.
I am not sure about any rails solution but there is one java based ORM that can help you in achieving exactly this.
You can try exploring https://github.com/impetus-opensource/Kundera. You would love it.
Kundera is a JPA 2.0 compliant Object-Datastore Mapping Library for NoSQL Datastores and currently supports Cassandra, HBase, MongoDB and all relational datastores (Kundera internally uses Hibernate for all relational datastores).
In your case you can use your existing objects along with JPA annotations to store them in Cassandra,MongoDB, MySQL etc.
Since this is in java you can build a java based service that you call from your rails app.

Should I use multiple databases?

I am about to create an application with Ruby on Rails and I would like to use multiple databases, basically is an accounting app that will have multiple companies for each user. I would like to create a database for each company
I found this post http://programmerassist.com/article/302
But I would like to read more thoughts about this issue.
I have to decide between MySQL and PosgreSQL, which database might fit better my problem.
There are several options for handling a multi-tenant app.
Firstly, you can add a scope to your tables (as suggested by Chad Birch - using a company_id). For most use-cases this is fine. If you are handling data that is secure/private (such as accounting information) you need to be very careful about your testing to ensure data remains private.
You can run your system using multiple databases. You can have a single app that uses a database for each client, or you can have actually have a seperate app for each client. Running a database for each client cuts a little against the grain in rails, but it is doable. Depending on the number of clients you have, and the load expectations, I would actually suggest having a look at running individual apps. With some work on your deployment setup (capistrano, chef, puppet, etc) you can make this a very streamlined process. Each client runs in a completely unique environment, and if a particular client has high loads you can spin them out to their own server.
If using PostgreSQL, you can do something similar using schemas.
PostgresQL schemas provide a very handy way of islolating your data from different clients. A database contains one or more named schemas, which in turn contain tables. You need to add some smarts to your migrations and deployments, but it works really well.
Inside your Rails application, you attach filters to the request that switch the current user's schema on or off.
Something like:
before_filter :set_app
def set_app
current_app = App.find_by_subdomain(...)
schema = current_app.schema
set_schema_path(schema)
end
def set_schema_path(schema)
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{schema}, #{connection.schema_search_path}")
end
def reset_schema_path
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{connection.schema_search_path}")
end
The problem with answers about multiple databases is when they come from people who don't have a need or experience with multiple databases. The second problem is that some databases just don't allow for switching between multiple databases, including allowing users to do their own backup and recovery and including scaling to point some users to a different data server. Here is a link to a useful video
http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
This link will help with Ruby on Rails with Postgresql.
I currently have a multi-tenant, multi-database, multi-user (many logons to the same tenant with different levels of access), and being an online SaaS application. There are actually two applications one is in the accounting category and the other is banking. Both Apps are built on the same structure and methods. A client-user (tenant) can switch databases under that user's logon. An agent-user such as a tax accountant can switch between databases for his clients only. A super-user can switch to any database. There is one data dictionary i.e. only one place where tables and columns are defined. There is global data and local data. Global data such as a master chart-of-accounts which is available to everyone (read only). Local data is the user's database. A new user can get a clone of a master database. There are multiple clones to choose from. A super-user can maintain the clone databases.
The problem is that it is in COBOL and uses ISAM files and uses the CGI method. The problem with this is a) there is a perception that COBOL is outdated, b) getting trained people, c) price and d) online help. Otherwise it works and I'm happy with it.
So I'm researching what to replace it with and what a minefield that is.
It has past time and the decission for this has been to use PostgreSQL schemas, making multitenant applications, I have a schema called common where related data is stored.
# app/models/organisation.rb
class Organisation < ActiveRecord::Base
self.table_name = 'common.organisations'
# set relationships as usual
end
# app/models/user.rb
class User < ActiveRecord::Base
self.table_name = 'common.users'
# set relationships as usual
end
Then for migrations I have done that with this excellent tutorial. http://timnew.github.com/blog/2012/07/17/use-postgres-multiple-schema-database-in-rails/ use this, this is way better than what I saw in other places even the way Ryan Bates did on railscasts.
When a new organisation is created then a new schema is created with the name of the subdomain the organisation. I have read in the past that it's not a good idea to use different schemas but it depends on the job you are doing, this app has almost no soccial component so it's a good fit.
No, you shouldn't use multiple databases.
I'm not really sure what advice to give you though, it seems like you have some very basic misunderstandings about database design, you may want to educate yourself on the basics of databases first, before going further.
You most likely just want to add a "company id" type column to your tables to identify which company a particular record belongs to.

Any thoughts on Multi-tenant versus Multi-database apps in Rails

Our app currently spawns a new database for each client. We're starting to wonder whether we should consider refactoring this to a multi-tenant system.
What benefits / trade-offs should we be considering? What are the best practices for implementing a multi-tenant app in Rails?
I've been researching the same thing and just found this presentation to offer an interesting solution: Using Postgre's schemas (a bit like namespaces) to separate data at the DB level while keeping all tenants in the same DB and staying (mostly) transparent to rails.
Writing Multi-Tenant Applications in Rails - Guy Naor
Multi-tenant systems will introduce a whole range of issues for you. My quick thoughts are below
All SQL must be examined and
refactored to include a ClientId
value.
All Indexes must be examined to
determine if the ClientId needs to be
included
An error in a SQL statement by a
developer/sysadmin in production will
affect all of your customers.
A database corruption/problem will
affect all of your customers
You have some data privacy issues
whereby poor code/implementation could
allow customerA to see data belonging
to CustomerB
A customer using your system in a
heavy/agressive manner may affect
other customers perception of performance
Tailoring static data to an individual customers preference becomes more complex.
I'm sure there are a number of other issues but these were my initial thoughts.
It really depends upon what you're doing.
We are making a MIS program for the print industry that tracks inventory, employees, customers, equipment, and does some serious calculations to estimate costs of performing jobs based on a lot of input variables.
We are anticipating very large databases for each customer, and we currently have 170 tables. Adding another column to almost every table just to store the client_id hurts my brain.
We are currently in the beta stage of our program, and here are some things that we have encountered:
Migrations: A Rails assumption is that you will only have 1 database. You can adapt it for multiple databases, and migrations is one of them. You need a custom rake task to apply migrations to all existing databases. Be prepared to do a lot of trouble shooting because a migration may succeed on one DB, but fail on another.
Spawning Databases: How do you create a new db? From a SQL file, copying an existing db, or running all migrations? How do you keep you schema consistent between your table creation system, and your live databases?
Connecting to the appropriate database: We use a cookie to store a unique value that maps to the correct DB. We use a before filter in an Authorized controller that inheirits from ActionController that gets the db from that unique value and uses the establish_connection method on a Subclass of ActiveRecord::Base. This allows us to have some models pull from a common db and others from the client's specific db.
If you have specific questions about any of these, I can help.
I don't have any experience with this personally, but during the lightning talks at the 2009 Ruby Hoedown, Andrew Coleman presented a plugin he designed and uses for multi-tenant databases in rails w/ subdomains. You can check out the lightning talk slides and here's the acts_as_restricted_subdomain repository.
Why would you? Do you have heavy aggregation between users or are you spawning too many DBs? Have you considered using SQLite files per tenant instead of shared DB servers (since multitenant apps often are low-profile and don't need that much concurrency)?

Generate new models and schema at runtime

Let's say your app enables users to create their own tables in the database to hold their own, custom data. Each table would have it's own schema. What are some good approaches?
My first stab involved dynamically creating migration files and model files bu I'd like to run this on heroku where you can't write to the filesystem.
I'm thinking eval may be the way to go to create and run the migration class and the model class. But I want to make sure the model class exists when a new process of the app is spawned. Can probably do this by storing these class definition with each user as they create new tables and then run through them all at startup. But now it's convulted enough that I may be missing something obvious.
It's probably a better idea not to generate new classes on runtime. Besides all of the security risks, each thread's startup time will be abominable if you ever get a significant number of users.
I would suggest rethinking your app design and aim at generic tables to hold the user's custom data. If you have examples of data structures that users can create we might be able to help.
Have you thought about a non-sql database for those tables? Look at CouchDB - there are several plugins on Github that integrate it with rails. Records in the database are JSON documents, with arbitrary key-value structure. May be perfect for a user-defined schema.
There is (was?) a cool Wiki project, called Informl. It was a Wiki, not just for web pages but for web applications. (Get it? It's informal because it's a Wiki, it's got forms because it is an application, and it's user-generated, thus Web 2.0, which means that according to an official UN resolution it is legally required to have a name which is missing a vwl.)
So, in other words, it was not just about user-generated content, but also user-generated structured data.
They did this by generating PostgreSQL-specific SQL at runtime to create new tables and then have ActiveRecord reload the schemas.
The code is up on RubyForge. It's based on Rails 1.2.3. I guess you could do much better than that today, especially with the upcoming extensibility interfaces in Rails 3.

Resources