Rails + Postgres: Multi-tenancy done right? - ruby-on-rails

I am going to build an app using Rails. It uses multi-tenancy using the Apartment gem and PostgreSQL. The app will have users, each of which have an account. This implies that each user has it's own PostgresSQL schema; the users table is in the default schema.
Each user has his own list of customers in his own schema. A customer with the same email (essentially the same customer) can appear in multiple schemas. I want a customer to be able to log in and see all the users he's associated with. I can't put the customers table in the default/public schema because it's related to other tables that are not in the default schema.
What I thought I would do is create a link table between customers and users in the public schema. This table would contain the email of the customer and the id of the user. My issue with that is that I don't understand how well this would work with Rails. What I would like to achieve is something like customer.users.
So the question is: How should I approach this problem?

I created this lib to help us solve this issue
https://github.com/Dandush03/pg_rls
Currently the most famous implementation are Apartment Gem (from Influitive), ActiveRecordMultiTenant (from Citus) and the rails 6.1 way, DB sharding
there are many issues with the Apartment and rails 6.1 approach when dealing with a huge amount of schemas/databases mainly when you most run a scale migration or when you have to change default values on a table, this is because you would need to run this migration on each tenant, and that very cost-efficient. and Citus's approach gets expensive in the long run. 
thankfully PostgreSQL came with a great solution on pg V-9 but it had some performance issue that was solved on pg V-10. This approach allows you to keep specific tables behind a 'special id' which can be later on partitioned with pg new tools
my approach is mainly focused on PG and how they recommend you to implement RLS, and most of my queries are executed with SQL statements which help us a bit when dealing with performance issues when running the migration. I tried to mix the best of rails with the best of pg functions.
what is even better on my approach is that when you start running your special pg search function, there will be no downfall because all data is secure and under the same DB. also you will gain the ability to log as a superuser and get your statistics.
I can keep going but I think I make my point clear, if you'd like I encourage you to check out the gem (prob still some bugs to handle like right now it only handle setting the Tenant from subdomain) but it does make my life easier on my ongoing projects. and if I get some more supports (like) I would keep on maintaining it and upgrading it to be a more generic tool

I suggest to differ between your users (who log in, not part of a tenant), and the customers (which are kept separately, and located in each tenant). The users table (possibly accompanied by other tables) can hold the information for the assignment from user to schema/customer etc. I would not even use foreign keys to link the user table with the tables in the tenant, just to keep them really separate.
In short, the user table serves to authenticate and to authorize only.
Update: The question describes a multi-tenancy approach using separate database schemas for the individual tenants. In this setup up, I would not link the users with the customers by database foreign keys, and I would not query them together. Just authenticate against the users, and get the assigned tenant(s). After that switch to the tenant.
If you really want to query both items (users and customers) in one run, I would not use separate schemas: One schema, create a tenant table, and put a foreign key into all other tables (customers etc.). In this scenario you could even do without a separate user table, and you could query the (single) customer table.
Update 2: To answer your query question:
You can query for schemas in PostgreSql's meta data:
select schemaname from pg_tables where tablename = 'Customer'
Which gives you all schemas with a customer table.
Using that information you can dynamically build a union select:
select name from schema1.customer
union all
select name from schema2.customer
union all
[...repeat for all schemas...]
to query all tables across schemas. You could use group by to eliminate duplicates.

Related

Database design for reverse-auction platform

The idea is for a reverse auction platform where users post their auction for certain services and providers bid on it with their offers.
Should I be splitting my tables? For example the auction can be for a new service or to replace an existing service so there are questions that are specific to each selection.
Should I move those columns into a separate table for that option?
Here is a diagram of what I've come up with so far:
Database Diagram image
Am I on the right track here?
What data type should I use for columns where there will be an list of options to choose from in the auction form? For example, cash_back will give the user a range of choices as:
Donate to Charity
Deposit to my account
Credit Voucher
Is the norm to use a string for this column with the respective strings or do I create a new table for the options and use the option_id as a foreign key in this table?
I think it is worth discussing here Rails philosophy and database design generally.
As I often say you can have a database for your application, or you can have an application for your database. In the latter, database design is important. In the former, it usually follows application design.
What this means is that you probably, assuming this is a Rails app, don't want to design your database at all. What you want to do is design your application object model and let Rails design your database. You won't get a great db design that way, but it will be good enough.
The tradeoff is that when you go this way, you often end up with the database as effectively owned by the application and it may not be safe to have other apps add or modify the data in the database. Moreover it may be harder to come up with really good reports, but where you put in most of your time will be better optimized (your main app).
TL;DR: If you want a database for your app and using Rails or Django, then stop thinking about database design, but realize that while this optimizes some pathways today, it makes many other things harder down the road.

Can a Drupal based website's schema be imported to Rails?

I'm working on a web site that needs to be re-written in Rails. The website was before in Drupal, and there are almost 100,000 records in the database. Now, in Drupal there are tables that do not make any place in Rails in my opinion. For example,
Table name: node_type
It stores information regarding modules in Drupal.
Table name: node
It stores information for node(s) in Drupal.
Table name: semaphore # I've no idea what it is!
Table name: rdf_mapping # No idea
I've not been working with Drupal, so all I want to ask: Is it possible to have a schema for Rails, in which the existing 100,000 records can be imported from Drupal? If so, how? If not so, what are the other options that I'm left with? Or I have to design an entirely new database schema?
Drupal's database schema is not extensively documented for a reason... it's considered implementation details, is not a public API and should not be accessed directly, especially by outside application.
It is also very hard to document because for a given site, any enabled module can add its own tables and alter existing ones (usually adding columns). Plus you have module like Fields (part of Drupal core) that create tables dynamically depending on defined content types.
For a RoR developer, the Drupal schema will probably look weird and be uncomfortable to work with. I would follow suggestion from others, create a new schema for your new application and create a migration script to get the data from the old Drupal database to your new database. I don't knwon about RoR, but try to find a good data migration that allows replay, updates and rollback, etc. You will probably have to migrate the data multiple times to fixes bugs in the process.
Well, I don't have straight forward answers, but I have some ideas what I would do simply to not make so much changes in the database, or as per the comment you can write down an sql script to migrate the data according to the rails schema like types for each tables. Now, I am just intended here to share my thoughts, but I believe there might be more explicit solutions and this is do-able in many ways, may be you need some customizations(?) overriding the default conventions. According to my thoughts, you can try the following things.
Generated Related model skipping migrations
Define tables explicitly to each models like the following snippets:
class Semaphore < ActiveRecord::Base
self.table_name = "semaphore"
end
You have to define foreign keys and primary keys explicitly for both record id and associations.
You have generate time stamp or you can explicitly avoid that like the following ways
ActiveRecord::Base.record_timestamps = false
These are basic things I can see is important.

Multi-tenant rails application: what are the pros and cons of different techniques?

I originally wrote my Ruby on Rails application for one client. Now, I am changing it so that it can be used for different clients. My end-goal is that some user (not me) can click a button and create a new project. Then all the necessary changes (new schema, new tables, handling of code) are generated without anyone needing me to edit a database.yml file or add new schema definitions. I am currently using the SCOPED access. So I have a project model and other associated models have a project_id column.
I have looked at other posts regarding multi-tenant applications in Rails. A lot of people seem to suggest creating a different schema for each new client in Postgres. For me, however, it is not much useful for a new client to have a different schema in terms of data model. Each client will have the same tables, rows, columns, etc.
My vision for each client is that my production database first has a table of different projects/clients. And each one of those tables links to a set of tables that are pretty much the same with different data. In other terms a table of tables. Or in other terms, the first table will map to a different set of data for each client that has the same structure.
Is the way I explained my vision at all similar to the way that Postgres implements different "schemas"? Does it look like nested tables? Or does Postgres have to query all the information in the database anyway? I do not currently use Postgres, but I would be willing to learn if it fits the design. If you know of database software that works with Rails that fits my needs, please do let me know.
Right now, I am using scopes to accomplish multi-tenant applications, but it does not feel scalable or clean. It does however make it very easy for a non-technical user to create a new project provided I give them fillable information. Do you know if it is possible with the multi-schema Postgres defintion to have it work automatically after a user clicks a button? And I would prefer that this be handled by Rails and not by an external script if possible? (please do advise either way)
Most importantly, do you recommend any plugins or that I should adopt a different framework for this task? I have found Rails to be limited in some cases of abstraction as above and this is the first time I have ran into a Rails-scaling issue.
Any advice related to multi-tenant applications or my situation is welcome. Any questions for clarification or additional advice are welcome as well.
Thanks,
--Dave
MSDN has a good introduction to multi-tenant data architecture.
At one end of the spectrum, you have one database per tenant ("shared nothing"). "Shared nothing" makes disaster recovery pretty simple, and has the highest degree of isolation between tenants. But it also has the highest average cost per tenant, and it supports the fewest tenants per server.
At the other end of the spectrum, you store a tenant id number in every row of every shared table ("shared everything"). "Shared everything" makes disaster recovery hard--for a single tenant, you'd have to restore just some rows in every shared table--and it has the lowest degree of isolation. (Badly formed queries can expose private data.) But it has the lowest cost per tenant, and it supports the highest number of tenants per server.
My vision for each client is that my production database first has a
table of different projects/clients. And each one of those tables
links to a set of tables that are pretty much the same with different
data. In other terms a table of tables. Or in other terms, the first
table will map to a different set of data for each client that has the
same structure.
This sounds like you're talking about one schema per tenant. Pay close attention to permissions (SQL GRANT and REVOKE statements. And ALTER DEFAULT PRIVILEGES.)
There are two railscasts on multitenancy that using scopes and subdomains and another to help with handling multiple schemas.
There is also the multitenant gem which could help with your scopes and apartment gem for handling multiple schemas.
Here is also a good presentation on multitenancy-with-rails.
Dont forget about using default scopes, while creating named scops the way you are now works it does feel like it could be done better. I came across this guide by Samuel Kadolph regarding this issue a few months ago and it looks like it could work well for your situation and have the benefit of keeping your application free of some PgSQL only features.
Basically the way he describes setting the application up involves adding the concepts of tennants to your application and then using this to scope the data at query time using the database.

Should I use multiple databases?

I am about to create an application with Ruby on Rails and I would like to use multiple databases, basically is an accounting app that will have multiple companies for each user. I would like to create a database for each company
I found this post http://programmerassist.com/article/302
But I would like to read more thoughts about this issue.
I have to decide between MySQL and PosgreSQL, which database might fit better my problem.
There are several options for handling a multi-tenant app.
Firstly, you can add a scope to your tables (as suggested by Chad Birch - using a company_id). For most use-cases this is fine. If you are handling data that is secure/private (such as accounting information) you need to be very careful about your testing to ensure data remains private.
You can run your system using multiple databases. You can have a single app that uses a database for each client, or you can have actually have a seperate app for each client. Running a database for each client cuts a little against the grain in rails, but it is doable. Depending on the number of clients you have, and the load expectations, I would actually suggest having a look at running individual apps. With some work on your deployment setup (capistrano, chef, puppet, etc) you can make this a very streamlined process. Each client runs in a completely unique environment, and if a particular client has high loads you can spin them out to their own server.
If using PostgreSQL, you can do something similar using schemas.
PostgresQL schemas provide a very handy way of islolating your data from different clients. A database contains one or more named schemas, which in turn contain tables. You need to add some smarts to your migrations and deployments, but it works really well.
Inside your Rails application, you attach filters to the request that switch the current user's schema on or off.
Something like:
before_filter :set_app
def set_app
current_app = App.find_by_subdomain(...)
schema = current_app.schema
set_schema_path(schema)
end
def set_schema_path(schema)
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{schema}, #{connection.schema_search_path}")
end
def reset_schema_path
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{connection.schema_search_path}")
end
The problem with answers about multiple databases is when they come from people who don't have a need or experience with multiple databases. The second problem is that some databases just don't allow for switching between multiple databases, including allowing users to do their own backup and recovery and including scaling to point some users to a different data server. Here is a link to a useful video
http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
This link will help with Ruby on Rails with Postgresql.
I currently have a multi-tenant, multi-database, multi-user (many logons to the same tenant with different levels of access), and being an online SaaS application. There are actually two applications one is in the accounting category and the other is banking. Both Apps are built on the same structure and methods. A client-user (tenant) can switch databases under that user's logon. An agent-user such as a tax accountant can switch between databases for his clients only. A super-user can switch to any database. There is one data dictionary i.e. only one place where tables and columns are defined. There is global data and local data. Global data such as a master chart-of-accounts which is available to everyone (read only). Local data is the user's database. A new user can get a clone of a master database. There are multiple clones to choose from. A super-user can maintain the clone databases.
The problem is that it is in COBOL and uses ISAM files and uses the CGI method. The problem with this is a) there is a perception that COBOL is outdated, b) getting trained people, c) price and d) online help. Otherwise it works and I'm happy with it.
So I'm researching what to replace it with and what a minefield that is.
It has past time and the decission for this has been to use PostgreSQL schemas, making multitenant applications, I have a schema called common where related data is stored.
# app/models/organisation.rb
class Organisation < ActiveRecord::Base
self.table_name = 'common.organisations'
# set relationships as usual
end
# app/models/user.rb
class User < ActiveRecord::Base
self.table_name = 'common.users'
# set relationships as usual
end
Then for migrations I have done that with this excellent tutorial. http://timnew.github.com/blog/2012/07/17/use-postgres-multiple-schema-database-in-rails/ use this, this is way better than what I saw in other places even the way Ryan Bates did on railscasts.
When a new organisation is created then a new schema is created with the name of the subdomain the organisation. I have read in the past that it's not a good idea to use different schemas but it depends on the job you are doing, this app has almost no soccial component so it's a good fit.
No, you shouldn't use multiple databases.
I'm not really sure what advice to give you though, it seems like you have some very basic misunderstandings about database design, you may want to educate yourself on the basics of databases first, before going further.
You most likely just want to add a "company id" type column to your tables to identify which company a particular record belongs to.

Any thoughts on Multi-tenant versus Multi-database apps in Rails

Our app currently spawns a new database for each client. We're starting to wonder whether we should consider refactoring this to a multi-tenant system.
What benefits / trade-offs should we be considering? What are the best practices for implementing a multi-tenant app in Rails?
I've been researching the same thing and just found this presentation to offer an interesting solution: Using Postgre's schemas (a bit like namespaces) to separate data at the DB level while keeping all tenants in the same DB and staying (mostly) transparent to rails.
Writing Multi-Tenant Applications in Rails - Guy Naor
Multi-tenant systems will introduce a whole range of issues for you. My quick thoughts are below
All SQL must be examined and
refactored to include a ClientId
value.
All Indexes must be examined to
determine if the ClientId needs to be
included
An error in a SQL statement by a
developer/sysadmin in production will
affect all of your customers.
A database corruption/problem will
affect all of your customers
You have some data privacy issues
whereby poor code/implementation could
allow customerA to see data belonging
to CustomerB
A customer using your system in a
heavy/agressive manner may affect
other customers perception of performance
Tailoring static data to an individual customers preference becomes more complex.
I'm sure there are a number of other issues but these were my initial thoughts.
It really depends upon what you're doing.
We are making a MIS program for the print industry that tracks inventory, employees, customers, equipment, and does some serious calculations to estimate costs of performing jobs based on a lot of input variables.
We are anticipating very large databases for each customer, and we currently have 170 tables. Adding another column to almost every table just to store the client_id hurts my brain.
We are currently in the beta stage of our program, and here are some things that we have encountered:
Migrations: A Rails assumption is that you will only have 1 database. You can adapt it for multiple databases, and migrations is one of them. You need a custom rake task to apply migrations to all existing databases. Be prepared to do a lot of trouble shooting because a migration may succeed on one DB, but fail on another.
Spawning Databases: How do you create a new db? From a SQL file, copying an existing db, or running all migrations? How do you keep you schema consistent between your table creation system, and your live databases?
Connecting to the appropriate database: We use a cookie to store a unique value that maps to the correct DB. We use a before filter in an Authorized controller that inheirits from ActionController that gets the db from that unique value and uses the establish_connection method on a Subclass of ActiveRecord::Base. This allows us to have some models pull from a common db and others from the client's specific db.
If you have specific questions about any of these, I can help.
I don't have any experience with this personally, but during the lightning talks at the 2009 Ruby Hoedown, Andrew Coleman presented a plugin he designed and uses for multi-tenant databases in rails w/ subdomains. You can check out the lightning talk slides and here's the acts_as_restricted_subdomain repository.
Why would you? Do you have heavy aggregation between users or are you spawning too many DBs? Have you considered using SQLite files per tenant instead of shared DB servers (since multitenant apps often are low-profile and don't need that much concurrency)?

Resources