Database sharding and Rails - ruby-on-rails

What's the best way to deal with a sharded database in Rails? Should the sharding be handled at the application layer, the active record layer, the database driver layer, a proxy layer, or something else altogether? What are the pros and cons of each?

FiveRuns have a gem named DataFabric that does application-level sharding and master/slave replication. It might be worth checking out.

I assume with shards we're talking about horizontal partitioning and not vertical partitioning (here are the differences on Wikipedia).
First off, stretch vertical partitioning as far as you can take it before you consider horizontal partitioning. It's easy in Rails to have different models point to different machines and for most Rails sites, this will bring you far enough.
For horizontal partitioning, in an ideal world, this would be handled at the application layer in Rails. But while it's not hard, it's not trivial in Rails, and by the time you need it, usually your application has grown beyond the point where this is feasible since you have ActiveRecord calls sprinkled all over the place. And no one, developers or management, likes working on it before you need it since everyone would rather work on features users will use now rather than on partitioning which may not come into play for years after your traffic has exploded.
ActiveRecord layer... not easy from what I can see. Would require lots of monkey patching into Rails internals.
At Spock we ended up handling this using a custom MySQL proxy and open sourced it on SourceForge as Spock Proxy. ActiveRecord thinks it's talking to one MySQL database machine when reality it's talking to the proxy, which then talks to one or more MySQL databases, merges/sorts the results, and returns them to ActiveRecord. Requires only a few changes to your Rails code. Take a look at the Spock Proxy SourceForge page for more details and for our reasons for going this route.

For those of you like me who hadn't heard of sharding:
http://highscalability.com/unorthodox-approach-database-design-coming-shard

rails 6.1 provides ability to switch connection per database thus we can do the horizontal partitioning.
Shards are declared in the three-tier config like this:
production:
primary:
database: my_primary_database
adapter: mysql2
primary_replica:
database: my_primary_database
adapter: mysql2
replica: true
primary_shard_one:
database: my_primary_shard_one
adapter: mysql2
primary_shard_one_replica:
database: my_primary_shard_one
adapter: mysql2
replica: true
Models are then connected with the connects_to API via the shards key
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to shards: {
default: { writing: :primary, reading: :primary_replica },
shard_one: { writing: :primary_shard_one, reading: :primary_shard_one_replica }
}
end
Then models can swap connections manually via the connected_to API. If using sharding, both a role and a shard must be passed:
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
#id = Person.create! # Creates a record in shard one
end
ActiveRecord::Base.connected_to(role: :writing, shard: :shard_one) do
Person.find(#id) # Can't find record, doesn't exist because it was created
# in the default shard
end
reference:
https://edgeguides.rubyonrails.org/active_record_multiple_databases.html#horizontal-sharding
https://dev.to/ritikesh/multitenant-architecture-on-rails-6-1-27c7

Connecting Rails to multiple databases is not a big deal- you simply have an ActiveRecord subclass for each shard that overrides the connection property. That makes it pretty simple if you need to make cross-shard calls. You then just have to write a little code when you need to make calls between the shards.
I don't like Hank's idea of splitting the rails instances, because it seems challenging to call the code between the instances unless you have a big shared library.
Also you should look at doing something like Masochism before you start sharding.

For rails to work with replicated environment, I would suggest using my_replication plugin which helps switch database connection to one of the slaves at run-time
https://github.com/minhnghivn/my_replication

To my mind, the simplest way is maintain a 1:1 between rails instances and DB shards.

Proxy layer is better, it can support all program languages.
For example: Apache ShardingSphere' proxy.
There are 2 different products of Apache ShardingSphere, ShardingSphere-JDBC for application layer which for Java language only and ShardingSphere-Proxy for proxy layer which for all program languages.
FYI: https://shardingsphere.apache.org/document/current/en/user-manual/shardingsphere-proxy/

Depends upon rails version. Newer rails version provide support for sharding as said by #Oshan. But if you can't update to a newer version you can use the octopus gem.
Gem Link
https://github.com/thiagopradi/octopus

Related

How to set transaction isolation level using ActiveRecord connection?

I need to manage transaction isolation level on a per-transaction basis in a way portable across databases (SQLite, PostgreSQL, MySQL at least).
I know I can do it manually, like that:
User.connection.execute('SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE')
...but I would expect something like:
User.isolation_level( :serializable ) do
# ...
end
This functionality is supported by ActiveRecord itself:
MyRecord.transaction(isolation: :read_committed) do
# do your transaction work
end
It supports the ANSI SQL isolation levels:
:read_uncommitted
:read_committed
:repeatable_read
:serializable
This method is available since Rails 4, it was unavailable when the OP asked the question. But for any decently modern Rails application this should be the way to go.
There was no gem available so I developed one (MIT): https://github.com/qertoip/transaction_isolation
Looks Rails4 would have the feature out of box:
https://github.com/rails/rails/commit/392eeecc11a291e406db927a18b75f41b2658253

mysql2 driver seems to write invalid queries

I'm developing an application layer on top of a rails app developed by someone else.
His application uses a module called request_logger to write to a table, which worked fine under ruby1.8/rails2/mysql gem, but in my ruby1.9/rails3/mysql2 environment, activerecord falls over, suggesting that the generated query is invalid.
It obviously is, all mysql relation names are wrapped in double quotes instead of backticks.
The call to activerecord itself just sets a bunch of attributes with
log.attributes = {
:user_id => user_id,
:controller => controller,
...etc
}
and then calls
log.save
So I'm leaning towards it not being dodgy invocation. Any suggestions?
mysql2 works fine for a lot of people, but it unashamedly sacrifices conformance to the MySQL C API for performance in the common tasks. Perhaps, if request_logger is low-level enough, it's expecting calls to exist which don't.
It's trivial to switch back to using mysql - give it a try, and if it works, stick with it. Remember to change both your Gemfile and your config/database.yml settings.
It turned out to be what seems to be a change in behaviour between rails 2 and 3 (we have the same setup working fine in rails 2)
We use database.yml to specify an (empty) "master" database and then feed in our clients with shards+octopus.
The master db is sqlite for simplicity, and it seems that activerecord was feeding off requests formatted for sqlite to the mysql2 shards, regardless of their adaptor type.

Different database connections in rails based on user params

I'm refactoring some features of a legacy php application which uses multiple databases with the same structure, one per language. When an user login choose his language and then all the following connections are made with the db for that app with some code like this:
$db_name = 'db_appname_' . $_SESSION['language'];
mysql_connect(...);
mysql_select_db($db_name);
I'd like to refactor also the database, but currently it's not an option because other pieces of software should remain in production with the old structure while the new app is developed, and for some time after it's been developed.
I saw this question, but both the question and the suggested gems are pretty old and it seems that they are not working with Rails 3.
Which is the best method to achieve this behavior in my new rails 3 app? Is there any other choice that avoid me to alter the db structure and fits my needs?
Last detail: in the php app even the login information are kept in separate tables, i.e. every db has its own users table and when the user logs in it also passes a language param in the login form. I'd like to use devise for auth in the new app which likely won't work with this approach, so I'm thinking to duplicate (I know, this is not DRY) login information in a separate User model with a language attribute and a separate db shared among languages to use devise features with my app. Will this cause any issue?
EDIT:
For completeness, I ended with this yml configuration file
production: &production
adapter: mysql
host: localhost
username: user
password: secret
timeout: 5000
production_italian:
<<: *production
database: db_app_ita
production_english:
<<: *production
database: db_app_eng
and with this config in base model (actually not exactly this but this is for keeping things clear)
MyModel < AR::Base
establish_connection "production_#{session[:language]}"
...
end
use establish_connection in your models:
MyModel < AR::Base
establish_connection "db_appname_#{session[:language]}"
...
end
Use MultiConfig gem I created to make this easy.
You could specify the configs in separate file like database_italian.yml etc and then call:
ActiveRecord::Base.config_file = 'database_italian'
This way it will be much easier to maintain and cleaner looking. Just add more language db config files as you wish

Multiple databases in Rails

Can this be done? In a single application, that manages many projects with SQLite.
What I want is to have a different database for each project my app is managing.. so multiple copies of an identically structured database, but with different data in them. I'll be choosing which copy to use base on params on the URI.
This is done for 1. security.. I'm a newbe in this kind of programming and I don't want it to happen that for some reason while working on a Project another one gets corrupted.. 2. easy backup and archive of old projects
Rails by default is not designed for a multi-database architecture and, in most cases, it doesn't make sense at all.
But yes, you can use different databases and connections.
Here's some references:
ActiveRecord: Connection to multiple databases in different models
Multiple Database Connections in Ruby on Rails
Magic Multi-Connections
If you are able to control and configure each Rails instance, and you can afford wasting resources because of them being on standby, save yourself some trouble and just change the database.yml to modify the database connection used on every instance. If you are concerned about performance this approach won't cut it.
For models bound to a single unique table on only one database you can call establish_connection inside the model:
establish_connection "database_name_#{RAILS_ENV}"
As described here: http://apidock.com/rails/ActiveRecord/Base/establish_connection/class
You will have some models using tables from one database and other different models using tables from other databases.
If you have identical tables, common on different databases, and shared by a single model, ActiveRecord won't help you. Back in 2009 I required this on a project I was working on, using Rails 2.3.8. I had a database for each customer, and I named the databases with their IDs. So I created a method to change the connection inside ApplicationController:
def change_database database_id = params[:company_id]
return if database_id.blank?
configuration = ActiveRecord::Base.connection.instance_eval { #config }.clone
configuration[:database] = "database_name_#{database_id}_#{RAILS_ENV}"
MultipleDatabaseModel.establish_connection configuration
end
And added that method as a before_filter to all controllers:
before_filter :change_database
So for each action of each controller, when params[:company_id] is defined and set, it will change the database to the correct one.
To handle migrations I extended ActiveRecord::Migration, with a method that looks for all the customers and iterates a block with each ID:
class ActiveRecord::Migration
def self.using_databases *args
configuration = ActiveRecord::Base.connection.instance_eval { #config }
former_database = configuration[:database]
companies = args.blank? ? Company.all : Company.find(args)
companies.each do |company|
configuration[:database] = "database_name_#{company[:id]}_#{RAILS_ENV}"
ActiveRecord::Base.establish_connection configuration
yield self
end
configuration[:database] = former_database
ActiveRecord::Base.establish_connection configuration
end
end
Note that by doing this, it would be impossible for you to make queries within the same action from two different databases. You can call change_database again but it will get nasty when you try using methods that execute queries, from the objects no longer linked to the correct database. Also, it is obvious you won't be able to join tables that belong to different databases.
To handle this properly, ActiveRecord should be considerably extended. There should be a plugin by now to help you with this issue. A quick research gave me this one:
DB-Charmer: http://kovyrin.github.com/db-charmer/
I'm willing to try it. Let me know what works for you.
I got past this by adding this to the top of my models using the other database
class Customer < ActiveRecord::Base
ENV["RAILS_ENV"] == "development" ? host = 'devhost' : host = 'prodhost'
self.establish_connection(
:adapter => "mysql",
:host => "localhost",
:username => "myuser",
:password => "mypass",
:database => "somedatabase"
)
You should also check out this project called DB Charmer:
http://kovyrin.net/2009/11/03/db-charmer-activerecord-connection-magic-plugin/
DbCharmer is a simple yet powerful plugin for ActiveRecord that does a few things:
Allows you to easily manage AR models’ connections (switch_connection_to method)
Allows you to switch AR models’ default connections to a separate servers/databases
Allows you to easily choose where your query should go (on_* methods family)
Allows you to automatically send read queries to your slaves while masters would handle all the updates.
Adds multiple databases migrations to ActiveRecord
It's worth noting, in all these solutions you need to remember to close custom database connections. You will run out of connections and see weird request timeout issues otherwise.
An easy solution is to clear_active_connections! in an after_filter in your controller.
after_filter :close_custom_db_connection
def close_custom_db_connection
MyModelWithACustomDBConnection.clear_active_connections!
end
in your config/database.yml do something like this
default: &default
adapter: postgresql
encoding: unicode
pool: 5
development:
<<: *default
database: mysite_development
test:
<<: *default
database: mysite_test
production:
<<: *default
host: 10.0.1.55
database: mysite_production
username: postgres_user
password: <%= ENV['DATABASE_PASSWORD'] %>
db2_development:
<<: *default
database: db2_development
db2_test:
<<: *default
database: db2_test
db2_production:
<<: *default
host: 10.0.1.55
database: db2_production
username: postgres_user
password: <%= ENV['DATABASE_PASSWORD'] %>
then in your models you can reference db2 with
class Customers < ActiveRecord::Base
establish_connection "db2_#{Rails.env}".to_sym
end
What you've described in the question is multitenancy (identically structured databases with different data in each). The Apartment gem is great for this.
For the general question of multiple databases in Rails: ActiveRecord supports multiple databases, but Rails doesn’t provide a way to manage them. I recently created the Multiverse gem to address this.
The best solution I have found so far is this:
There are 3 database architectures that we can approach.
Single Database for Single Tenant
Separate Schema for Each Tenant
Shared Schema for Tenants
Note: they have certain pros and cons depends on your use case.
I got this from this Blog! Stands very helpful for me.
You can use the gem Apartment for rails
Video reference you may follow at Gorails for apartment
As of Rails 6, multiple databases are supported: https://guides.rubyonrails.org/active_record_multiple_databases.html#generators-and-migrations
Sorry for the late and obvious answer, but figured it's viable since it's supported now.

Rails Console Doesn't Automatically Load Models For 2nd DB

I have a Rails project which has a Postgres database for the actual application but which needs to pull a heck of a lot of data out of an Oracle database.
database.yml looks like
development:
adapter: postgresql
database: blah blah
...
oracle_db:
adapter: oracle
database: blah blah
My models which descend from data on the Oracle DB look something like
class LegacyDataClass < ActiveRecord::Base
establish_connection "oracle_db"
set_primary_key :legacy_data_class_id
has_one :other_legacy_class, :foreign key => :other_legacy_class_id_with_funny_column_name
...
end
Now, by habit I often do a lot of my early development (and this is early development) by coding for a bit and then playing in the Rails console. For example, after defining all the associations for LegacyDataClass I'll start trying things like a = LegacyDataClass.find(:first); puts a.some_association.name. Unexpectedly, this dies with LegacyDataClass not being already loaded.
I can then require 'LegacyDataClass' which fixes the problem until I either need to reload!, which won't actually reload it, or until I open a new instance of the console.
Thus the questions:
Why does this happen? Clearly there is some Rails magic I am not understanding.
What is the convenient Rails workaround?
I believe this might have to do with your model name, rather than your connection. The Rails convention is that model class names are CamelCase, while the files they reside in are lowercase+underscore.
The "LegacyModel" class should therefore be in models/legacy_model.rb. Your statement about "require 'LegacyDataClass'" indicates that this is not the case, and therefore Rails doesn't know how to automagically load that model.
I wrote something for an app at work that handles connections to other databases' at runtime, it might be able to help.
http://github.com/cherring/connection_ninja

Resources