Possible to have one app on Heroku that dynamically uses different databases? - ruby-on-rails

I have an idea for a multi-tenant app, and I'm trying to decide if I should use one large database or use separate databases for each tenant.
I don't even know if the latter is possible in Rails, or with rails on Heroku.
I also don't know if this is a good idea, or cost prohibitive.
But I guess to start I just want to know if it's possible.

There are many approaches to multi-tenancy, each with its own pros and cons. Postgres has this nice feature called schemas, which means you can have one database but multiple namespaces inside. This can be a convenient solution for Rails, as Rails was designed for connecting with only one database. It is easy to integrate with apartment gem. It takes care of migrations and tenant switching based on specified rules, usually subdomain. But this solution has downsides. While Postgres does not have any limitation on number of schemas, when you have a lot, migrations will take forever. And there are problems with backups. Heroku recommends using less than 50 schemas.
If you want to have multiple physical databases then it is a little bit tricky with Rails. There are some gems that allow connecting to multiple databases. Recently I heard about octoshark gem, but I haven't use it.
In summary, Postgres schemas are nice if you want to have good isolation without too much work. It will be also cost efficient on Heroku, as you will use only one database. But it won't scale for a lot of tenants. Multiple databases provide the best isolation, but support for this solution in Rails is not that great I think. And it will be costly as you you will need to provision separate database for each tenant. And the last resort is to just use one database and scope all your tenant data with tenant_id. In this solution you need to guarantee isolation which requires additional work and it is easy to miss some parts of the application.

Related

How to handle database scalability with Ruby on Rails

I am creating a management system and I want to know how "Ruby on Rails" can support me in the mission of ensuring that each customer has their information, records and tables independent from other customers.
Is it better to put everything in a database and put a customer identifier to pull information through this parameter in queries or create a database for each customer automatically?
I admit that the second option attracts me more ... And I know that putting everything in one database will be detrimental to performance, because I assume that customers and their data will increase exponentially!
I want to know which option is more viable in the long run. And if the best option is to create separate databases, how can I do this with Ruby on Rails ??
There are pro and cons for both solutions which really depend on your use case.
Separating each customer in its own database has definitely advantages for scaling, running in different data centres or even onsite. However, this comes with higher complexity. For instance you can't query across customers anymore, you would need to run queries for each customers and aggregate the results. This approach is called multi tenancy (or shardening). There is a good gem called Apartment available (https://github.com/influitive/apartment).
Keeping everything in one database might be simpler to start of as it's less complex but it really depends on your use case.
Edit
Adding some more information based on the questions.
There are several reasons to use a one db per client architecture.
You have clearly separated tenants. In case it might make sense to go with the one db approach.
Scale. Having separated databases for each tenant makes scaling of course easier.
If 2) is the main reason you want to go for a one db per client approach I would strongly advise you against it. You add so much more complexity to your app which you might not need for years to come (if ever).
If scaling is your main concern I recommend reading Designing Data Intensive Applications by Martin Kleppmann. But basically, don't worry about scale for the first few years and focus on your product.

Data distribution for a system with SOA

I have a rails application which manages different types of items and users who own them. Items of different types might have different features. There is a number of sinatra services which have to access items (read-only, every service one specific item type).
Is it a good idea to create separate tables / databases for every service and to keep them in sync with the rails DB? In this case the main DB will hold all items. It's postgres, so hstore could be used for different features. On all updates a sync message will be sent using Redis pub/sub or RabbitMQ messaging. Services will subscribe and update service specific tables.
The system should be really reliable, scalable, and prepared for high-load and new not yet known item categories. What do you think? Does it make sense or are there better approaches for these requirements? Thank you in advance, I really appreciate your help!
There is no one-size-fits-all answer here. The answer depends on your requirements and these will decide which of two approaches you might take.
The first approach is conceptually the simplest, which is to have every service hit the same database. The advantage here is that you can scale up relatively easily, the system is simple and flexible, and you can do a lot with the database to keep things working well. The disadvantage is that db downtime will take down all services at once.
The second approach is to keep every service (or group of closely related services) as separate self-contained service, kept in sync with some sort of message passing. This has the advantage of being more robust in terms of delivering basic services, but far less robust in terms of everything staying in sync (because the CAP theorem's consistency requirement is sacrificed for availability, and your data is effectively partitioned).
I don't know which one you will want to use. To the extent possible I would usually choose the single db approach but I am a Postgres guy, not a Rails guy. The second approach also works quite well in some cases but it does have a complexity cost.

Multi-schema Postgres on Heroku

I'm extending an existing Rails app, and I have to add multi-tenant support to it. I've done some reading, and seeing how this app is going to be hosted on Heroku, I thought I could take advantage of Postgres' multi-schema functionality.
I've read that there seems to be some performance issues with backups when multiple schemas are in use. This information I felt was a bit outdated. Does anyone know if this is still the case?
Also, are there any other performance issues, or caveats I should take into consideration?
I've already thought about adding a field to every table so I can use a single schema, and have that field reference to the tenants table, but given the time windows multiple schemas seem the best solution.
I use postgres schemas for a multi-tenancy site based on some work by Ryan Bigg and the Apartment gem.
https://leanpub.com/multi-tenancy-rails
https://github.com/influitive/apartment
I find that having seperate schemas for each client an elegant solution which provides a higher degree of data segregation. Personally I find the performance improves because Postgres can simply return all results from a table without have to filter to an 'owner_id'.
I also think it makes for simpler migrations and allows you to adjust individual customer data without making global changes. For example you can add columns to specific customers schemas and use feature flags to enable custom features.
My main argument relating to performance would be that backup is a periodic process, whereas customer table scoping would be on every access. On that basis, I would take any performance hit on backup over slowing down the customer experience.

Single or multiple databases? (Rails 3)

I am reasonably new to Ruby on Rails so I am not sure how to implement this. My understanding is that rails is not designed with multiple databases in mind, although I could use establish_connection etc to make it work.
My main problem is:
I have an SaaS/application that will serve several businesses. Each
business will have several database tables such as: users, comments,
messages, transfers, navigation history, logs etc. It seems I have 3
options:
1: Store everybody's data in one database with every object belonging_to a business or just tagging something like a businessID/name. Use this tag to fetch the appropriate data and worry about scaling/performance later as my app grows. (Would I have to worry about this pretty early on?)
2: One database per Business. No need to store associations, and db queries perform consistently throughout the application's life (possibly bad assumption here).
3: Have separate instances of my app each running some number of businesses (not sure this is any good).
What I have seen used in other frameworks/businesses is just (2) multiple dbs.
I am also really interested is what is the best practice in rails as well. I know several applications have this same problem and hearing how this has been solved will help.
Any help is much appreciated. Thank you so much.
Env.
Ruby 1.9.2
Rails 3.1
Production:Heroku or EY (still deciding, now running on heroku)
According to this page, You'd need to apply some metaprogramming for multiple databases.
Why not make your deployment script to deploy to different directories with different database settings? One branch per business? Might require some more maintenance, but allows for per-business code if you need it.

Am I the only one that queries more than one database?

After much reading on ruby on rails and multiple database connections, it seems that I have found something that not that many folks do, at least not with ror. I am used to querying many different databases and schemas and pulling back the information either for a report or for one seamless page. So, a user doesn't have to log on to several different systems. I can create a page that has all the systems on one or two web pages.
Is that not a normal occurrence in the web and database driven design?
EDIT: Is this because most all my original code is in classic asp?
I really honestly think that most ORM designers don't seem to take the thought that users may want to access more than one database into account. This seems to be a pretty common limitation in the ORM universe.
Our client website runs across 3 databases, so I do this to. Actually, I'm condensing everything into views off of one central database which then connects to the others.
I never considered this to be "normal" behavior though. I would guess that most of the time you would be designing for one system and working against that.
EDIT: Just to elaborate, we use Linq to SQL for our data layer and we define the objects against the database views. This way we keep reports and application code working off the same data model. There is some extra work setting up the Linq entities, because you have to manually define primary keys and set up associations... however so far it has definitely proven worthwhile. We tried to do so with Entity Framework, but had a lot of trouble getting the relationships set up appropriately and had to give up. The funny thing is I had thought Entity Framework was supposed to be designed for more advanced scenarios like ours...
It is not uncommon to hit multiple databases during a single part of an application's workflow. However, in every instance that I have done it, this has been performed through several web service calls, which among other things wrap the databases in question.
I have not, to my knowledge, ever had a need to hit multiple databases directly at once and merge results into a single report.
I've seen this kind of architecture in corporate Portals- where lots of data is pulled in via different data sources. The whole point of a portal is to bring silo'd systems together- users might not want to be using lots of systems in isolation (especially if they have to sign into each one). In that sort of scenario it is normal, particularly if it is a large company that has expanded rapidly and has a large number of heterogenous systems.
In your case whether this is the right thing to do depends on why you have these seperate DBs.
With ORM's it may be a little difficult. However, it can be done. Pull the objects as needed from the various databases, then use them as a composite to create a new object that is the actual one that is desired. If you can skip the ORM part of the process, then you can directly query the databases and build your object directly.
Pulling data from two databases and compiling a report is not uncommon, but because cross-database queries cannot be optimized by the query engine of either database, OLTP systems typically use a single database, to keep the application performant.
If you build the system from the ground up, it is not advisable to do it this way. If you are working with a system you didn't design, there is no much choice and it is not uncommon (that is the difference between "organic" and "planned" grow).
Not counting master and various test instances, I hit nine databases on a regular basis. Yes, I inherited it, and yes, "Classic" ASP figures prominently. Of course, all the "brillant" designers of this mess are long gone. We're replacing it with things more sane as quickly as we safely can.
I would think that if you're building a new system, and keep adding databases and get to the point of two or three databases, it's probably time to re-think your design. OTOH, if you're aggregating data from multiple, disparate systems, then, no, it's not that strange. Depending on the timliness you need, and your budget for throwing hardware at the problem, and if your data is mostly static, this would be a good scenario for a "reporting server" that pulls the data down from the Live server periodically.

Resources