Rails+PostgreSQL: search_path depending on subdomain - ruby-on-rails

In our rails 2.x application the search_path of the database connection depends on the subdomain through which the application is contacted (basically search_path = "production_"+subdomain). Because the search_path is defined per connection and database connections are shared over requests, even concurrently, this is a problem. I would rather not change concurrency to only serve one request at a time for obvious reasons.
So is there a way to group the database connections in the connection pool and set some kind of policy that only a fitting connection is used for the request? Or is there a way to use one connection pool per subdomain (where the pools are automatically discarded after a timeout)? Starting a rails instance for each subdomain is no option because there might be many idling subdomains (it's some kind of pro-account where you get a subdomain and your own "world" that differs from the rest of the site in some tables).
What would be the best solution for this problem?

You can just set connection.search_path at the beginning of the request, before any objects are loaded, and you'll be fine. In our case we have a Rack app that wraps our rails app and does this setup for us based on the incoming domain.

Related

Would it be possible to have multiple database connection pools in rails to switch between?

A little background
I have been using the Apartment gem for running a multi-tenancy app for years. Now recently the need to scale the database out into separate hosts has arrived, the db server simply can't keep up any more (both reads and writes are getting too too much) - and yes, I scaled the hardware to the max (dedicated hardware, 64 cores, 12 Nvm-e drives in raid 10, 384Gb ram etc.).
I was considering doing this per-tenant (1 tenant = 1 database connection config / pool) as that would be a "simple" and efficient way to get up to number-of-tenants-times more capacity without doing loads of application code changes.
Now, I am running rails 4.2 atm., soon upgrading to 5.2. I can see that rails 6 adds support for a per-model connection definitions, however that is not really what I need, as I have a completely mirrored database schema for each of my 20 tenants. Typically I switch "database" per request (in middleware) or per background job (sidekiq middleware), however this is currently trivial and handled ny the Apartment gem, as it just sets the search_path in Postgresql and does not really change the actual connection. When switching to a per-tenant hosting strategy I will need to switch the entire connection per request.
Questions:
I understand that I could do an ActiveRecord::Base.establish_connection(config) per request / background job - however, as I also understand, that triggers an entirely new database connection handshake to be made and a new db pool to spawn in rails - right? I guess that would be a performance suicide to make that kind of overhead on every single request to my application.
I am therefore wondering if anyone can see the option with rails of e.g. pre-establishing multiple (total of 20) database connections/pools from the beginning (e.g. on boot of the application), and then just switch between those pools per request? So that he db connections are already made and ready to be used.
Is all this just a poor poor idea, and should I instead look for a different approach? E.g. 1 app instance = one specific connection to one specific tenant. Or something else.
As I understand, there are 4 pattern for multi-tenancy app:
1. Dedicated model/Multiple Production Environments
Each instance or database instance entirely host different tenant application and nothing is shared among tenants.
This is 1 instance app and 1 database for 1 tenant. The development would be easy as if you serve 1 tenant only. But will be nightmare for devops if you have, say, 100 tenants.
2. Physical Segregation of Tenants
1 instance app for all tenant but 1 database for 1 tenant. This is what you are searching for. You can use ActiveRecord::Base.establish_connection(config), or using gems, or update to Rails 6 as other suggests. See the answer for (2) below.
3. Isolated schema model/Logical Segregations
In an Isolated Schema, the tenant tables or database components are group under a logical schema or name-space and separated from other tenant schemas, however the schema are hosted in the same database instance.
1 instance app and 1 database for all tenant, like you do with apartment gem.
4. Partially Isolated Component
In this model, components that have common functionalities are shared among tenants while components with unique or unrelated functions are isolated. At the data layer, common data such as data that identify tenants are grouped or kept in single table while tenant specific data are isolated at table or instance layer.
As for (1), ActiveRecord::Base.establish_connection(config) not handshaking to db per request if you use it correctly. You can check here and read all the comment here.
As for (2), If you don't want to use establish_connection, you can use gem multiverse (it works for rails 4.2), or other gems. Or, as other suggest, you can update to Rails 6.
Edit: Multiverse gem is using establish_connection. It will append the database.yml, and create a base class so that each subclass shares the same connection/pool. Basically, it reducing our effort to use establish_connection.
As for (3), the answer:
If you don't have so many tenants, and your application is pretty complex, I suggest you use Dedicated Model pattern. So you go for 1 app instance = one specific connection to one specific tenant. You don't have to make your apps more complex by adding multiple database connections.
But if you have many tenants, I suggest you use Physical Segregation of Tenants or Partially Isolated Component depends on your business process.
Either way, you have to update/rewrite your application to comply with the new architecture.
Just a couple of days ago horizontal sharding was added to Ruby on Rails' master branch on GitHub. Currently, this feature is not officially released but depending on your application's Rails version you might want to consider using Rails master by adding this to your Gemfile:
gem "rails", github: "rails/rails", branch: "master"
With this new feature, you can take advantage of Rails' database connection pool and switch the database based on conditions.
I haven't used this new feature, but it seems pretty straight-forward:
# in your config/database.yml
production:
primary:
database: my_database
# other config: user, password, etc
primary_tenant_1:
database: tenant_1_database
# other config: user, password, etc
# in your controller for example when updating a tenant
ActiveRecord::Base.connected_to(shard: "primary_tenant_#{tenant.database_shard_number}") do
tenant.save
end
You didn't add much detail about how you determine the tenant number or how authorization is done in your application. But I would try to determine the tenant number as soon as possible in the application_controller in an around_action. Something like this might be a starting point:
around_filter :determine_database_connection
private
def determine_database_connection
# assuming you have a method to determine the current_tenant and that tenant
# has a method that returns the number of the shard to use or even the
# full shard identifier
shard = current_tenant.database_shard # returns for example `:primary_tenant_1`
ActiveRecord::Base.connected_to(shard: shard) do
yield
end
end
From what I understand, (2) should be possible with manual connection switching in Rails 6.

Rails - is new instance of Rails application created for every http request in nginx/passenger

I have deployed a Rails app at Engineyard in production and staging environment. I am curious to know if every HTTP request for my app initializes new instance of my Rails App or not?
Rails is stateless, which means each request to a Rails application has its own environment and variables that are unique to that request. So, a qualified "yes", each request starts a new instance[1] of your app; you can't determine what happened in previous requests, or other requests happening at the same time. But, bear in mind the app will be served from a fixed set of workers.
With Rails on EY, you will be running something like thin or unicorn as the web server. This will have a defined number of workers, let's say 5. Each worker can handle only one request at a time, because that's how rails works. So if your requests take 200ms each, that means you can handle approximately 5 requests per second, for each worker. If one request takes a long time (a few seconds), that worker is not available to take any other requests. Workers are typically not created and removed on Engineyard; they are set up and run continuously until you re-deploy, though for something like Heroku, your app may not have any workers (dynos) and if there are no requests coming in it will have to spin up.
[1] I'm defining instance, as in, a new instance of the application class. Each model and class will be re-instantiated and the #request and #session built from scratch.
According to what I have understood. No, It will definitely not initialize new instance for every request. Then again two questions might arise.
How can multiple user simultaneously login and access my system without interference?
Even though one user takes up too much processing time, how is another user able to access other features.
Answer to the first question is that HTTP is stateless, everything is stored in session, which is in cookie, which is in client machine and not in server. So when you send a HTTP request for a logged in user, browser actually sends the HTTP request with the required credentials/user information from clients cookies to the server without the user knowing it. Multiple requests are just queued and served accordingly. Since our server are very very fast, I feel its just processing instantly.
For the second query, your might might be concurrency. The server you are using (nginx, passenger) has the capacity to serve multiple request at same time. Even if our server might be busy for a particular user(Lets say for video processing), it might serve another request through another thread so that multiple user can simultaneously access our system.

Thread-safe way of changing the connection search_paths

I want to be able to switch between different DB schemas in a Rails 4 app.
The plan is to add a new middleware in the very beginning of the stack that will do that for me.
The only way to do it is by setting ActiveRecord::Base.connection.schema_search_path = '"$user",my_schema'.
The problem I have with this is that this connection will go to the pool and all the following requests will use the schema that was set in the first one (basically leaking it through).
So the solution I see is to always reset the search path to what it was before and always set it on each request.
But I don't want to do this because:
99% of the requests will go to the default (public) schema, executing set search_path to '$user$,my_schema' would be additional query that could have been avoided
higher risk of leaking (other middleware may establish the connection earlier, or some changes to Rails or gems outside of my control)
All that especially applies to threaded servers, like Puma.
So are there any better alternatives to my solution with a middleware?
Thanks.
When you return connections to the pool, you must ensure the pool runs DISCARD ALL; to reset the connection state.
That will clear any SET ROLE, SET SESSION AUTHORIZATION, session variables, search_path setting, etc.

Rails app for multiple domains

I need to create a service with the same five sites and one that will unite them, but everyone has to live on a separate domain. Maybe somehow run one instance Rails app redirect users depending on the domain? Or is it better to start on a new Rails app per instance?
Why not just set up one instance of the rails app, and configure your http server (Apache, Nginx or whatever) to listen to connections on all of those domains?
Genarally, there are two ways you can do that with one rails instance.
redirect the comming requests to diffrent uri path ('/site1', '/site2' etc) from the http server based on the domain names. I'm not a pro on setting up this. But i'm sure this is do-able.
redirect the comming requests to different path from the application controller in a before filter based on the value of request.url.host variable.
redirect_to my_site1_path if request.url.host == 'www.site1.com'
You can choose either of them, it's up to you :).

Is it okay to authenticate with MongoDB per request?

I have a Rails controller that needs to write some data to my MongoDB. This is what it looks like at the moment.
def index
data = self.getCheckinData
dbCollection = self.getCheckinsCollection
dbCollection.insert(data)
render(:json => data[:_id].to_s())
end
protected
def getCheckinsCollection
connection = Mongo::Connection.new('192.168.1.2', 27017)
db = connection['local']
db.authenticate('arman', 'arman')
return db['checkins']
end
Is it okay to authenticate with MongoDB per request?
It is probably unnecessarily expensive and creating a lot more connections than needed.
Take a look at the documentation:
http://www.mongodb.org/display/DOCS/Rails+3+-+Getting+Started
They connect inside an initializer. It does some connection pooling so that connections are re-used.
Is there only one user in the database?
I'd say: don't do the db authentication. If MongoDB server is behind a good firewall, it's pretty secure. And it never ever should be exposed to the internet (unless you know what you're doing).
Also, don't establish a new connection per request. This is expensive. Initialize one on startup and reuse it.
In general, this should be avoided.
If you authenticate per request and you get many requests concurrently, you could have a problem where all connections to the database are taken. Moreover, creating and destroying database connections can use up resources within your database server -- it will add a load to the server that you can easily avoid.
Finally, this approach to programming can result in problems when database connections aren't released -- eventually your database server can run out of connections.

Resources