In my rails application, I have some code like this:
def foo
if object_bar_exists
raise "can't create bar twice!"
end
Bar.create
end
Which could be invoked by two different requests coming into the application server. If this code is run by two requests simultaneously, and they both run the if check at the same time, neither will find the other's bar, and 2 bars will be created.
What's the best way to create a "mutex" for "the collection of bars"? A special purpose mutex table in the DB?
update
I should emphasize that I cannot use a memory mutex here, because the concurrency is across requests/processes and not threads.
The best thing to do is perform your operations in a DB transaction. Because you will probably eventually have multiple applications running and they very possibly won't share memory, you won't be able to create a Mutex lock on the application level, especially if those two application services are running on entirely different physical boxes. Here's how to accomplish the DB transaction:
ActiveRecord::Base.transaction do
# Transaction code goes here.
end
If you want to ensure a rollback on the DB transaction then you'll have to have validation enabled on the Bar class so that an invalid save request will cause a rollback:
ActiveRecord::Base.transaction do
bar = Bar.new(params[:bar])
bar.save!
end
If you already have a bar object in the DB, you can lock that object pessimistically like this:
ActiveRecord::Base.transaction do
bar = Bar.find(1, :lock => true)
# perform operations on bar
end
If they all the requests are coming into the same machine, and the same ruby virtual machine, you could use Ruby's built in Mutex class: Mutex Docs.
If there are multiple machines or rvms, you will have to use a database transaction to create / get the Bar object, assuming it's stored in the db somehow.
I would probably create a Singleton object in the lib directory, so that you only have one instance of the thing, and use Mutexes to lock on it.
This way, you can ensure only one access to the thing at any given point in time. Of course, any other requests will block on it, so that's something to keep in mind.
For multiple machines, you'd have to store a token in a database, and synchronize some kind of access to the token. Like, the token has to be queried and pulled out, and keep track of some number or something to ensure that people cannot remove the token at the same time. Or use a centralized locking web-service so that your token handling is only in one spot.
Related
There are a few methods: first_or_create_by, find_or_create_by, etc which work on the principle:
talk to the database to try to find the stuff we want
if we didn't find it, make it ourselves
save it to the db
Clearly, concurrent calls of these methods could have both threads not find what they want, and at step 3 one will unexpectedly fail.
Seems like a better solution is,
create_or_find
That is:
create sensible uniqueness constraints in your DB ahead of time.
save something if you want to save it
if it worked, good.
if it didn't work because of a RecordNotUnique exception, it's already there, great, load it
So in what circumstances would I want to use the Rails built-in stuff and not my own (seemingly more reliable) create_or_find?
After digging in, I'm going to answer my own question.
The document for find or create by says:
Please note this method is not atomic, it runs first a SELECT, and
if there are no results an INSERT is attempted. If there are other
threads or processes there is a race condition between both calls and
it could be the case that you end up with two similar records.
Whether that is a problem or not depends on the logic of the
application, but in the particular case in which rows have a UNIQUE
constraint an exception may be raised, just retry:
begin
CreditAccount.find_or_create_by(user_id: user.id)
rescue ActiveRecord::RecordNotUnique
retry
end
This, in general, will have better performance than create_or_find.
Consider that create_or_find will require 1 DB trip in the case of success, which will only happen once per unique record. Every other time it will require 2 DB trips (a failed create and a search).
A retried find_or_create will require 3 trips in the case of failure (search, failed create, search again), but that can only happen so many times in a very small window. Beyond that, every other call will to find_or_create a record, will require 1 DB trip.
Therefore the amortized cost of retried find_or_create is better, and reached quickly.
Clearly it's not thread-safe by default but they may be designed this way to perform better.
It's way faster to find first and create if necessary than fail creation most of the time and sometimes avoid an exception (that could be handled).
This discussion may be helpful for you.
Note: note sure if this is the best title for the question / open to suggestions to edit for future value *
I have a multi-tenant rails application, which allows clients to use their own custom TLDs. So I can have:
www.clientA.com
www.clientB.com
www.clientC.com
etc....
For better or worse, my database (postgres) has a tenants table, which has approximately 60 columns with various settings & configurations for each tenant. Some are simply flags, and some are large text values.
In application_controller.rb I have some logic to parse the URL, query the tenants table based on the domain and instantiate a #current_tenant object. That #current_tenant object is available throughout the lifecycle of the page. This happens on every. single. page. of my app.
While I have some heavy caching in place, this design still feels very wrong to me, and I am sure it can be improved.
Is there a best practice for this? What is the best pattern to handle the #current_tenant object? I am concerned about memory management.
your current_tenant is no different from current_user that is ubiquitous in Rails apps.
stop microoptimizing. unless you benchmarked it and it shows to be a major bottleneck (which it is not, I can assure you).
any minuscule performance improvement (if at all) will be offset by the increased complexity of code, caching problems and what not.
do. not. do. that. ;)
one thing though, do NOT do a before_filter that assigns #current_tenant, instead add a current_tenant method to the application_controller (or one of its concerns) that will cache the result for the remainder of the request:
def current_tenant
#current_tenant ||= ....
end
You could also cache the #current_tenant object using Rails cache, ie:
def current_tenant( host = request.host )
Rails.cache.fetch(host, expires_in: 5.minutes) do
Tenant.find_by(tld: host)
end
end
(Note: I haven't touched Rails for 3 years, so take this answer with the appropriate fistful of salt.)
Is there a best practice for this? What is the best pattern to handle the #current_tenant object?
The best way I've seen this kind of stuff implemented is in PHP, rather than Ruby on Rails. More specifically, in the Symfony framework.
In a nutshell, a Symfony app's layout is like so:
/app <-- app-specific Kernel, Console, config, cache, logs, etc.
/src <-- app-specific source files
/vendor <-- vendored source files
/web <-- public web folder for the app
To run multiple apps from the same code base, you'd basically go:
/app
/app2
/...
/src
/vendor
/web
/web2
/...
... and then point each domain to a different /web folder.
As I understand, it's possible to re-organize a Rail project's directory structure to something more or less equivalent. See this related answer in particular:
https://stackoverflow.com/a/10480207/417194
From there, you could theoretically boot up one instance of your app per tenant, each with their separate resources, hard-coded defaults, etc., while continuing to use a shared code base.
The next best option is, imo, what you're currently doing. Use memcached or equivalent to quickly map specific domain names to tenants, and fall back to a database query as necessary. I imagine you're doing this already, since you've "some heavy caching in place".
As an aside, you might find this related question, and the ones linked within it, interesting:
Multiple applications using a single code base in ruby
(And FWIW, I ended up not sticking with PHP for multi-tenant apps. There was no magic bullet last I played with Ruby or Rails.)
With respect to memory usage, methinks don't worry too much about it: strictly speaking, you're looking up and creating the tenant object a single time per request. Even if doing so was dreadfully slow, cursory monitoring of where your app is actually spending time will reveal that it's negligible compared to innocuous looking pieces of code that get run a gazillion times per request.
I designed an application that is doing basically exactly what you have described. Since you are loading a single object (even if on every page request), just make sure the query only returns the one row (current tenant) and doesn't do a crazy amount of joins. A single row query with a LIMIT applied to it is not going to bring down your site, even if requested hundreds of times a second. And regardless, if you are getting that type of traffic, you will have to scale your server anyways.
One thing that can be done to help, is to make sure your search column is indexed in the database. If you are finding the current tenant by url, index the url column.
Here is an example of what I did. I globalized the variable so this information is available in all controllers/models/views.
In my application controller:
before_filter :allocate_site
private
def allocate_site
url = request.host_with_port
url.slice! "www."
# load the current site by URL
$current_site = Site.find_by({:url => url, :deleted => false})
# if the current site doesn't exist, we are going to create a placeholder
# for the URL hitting the server.
if $current_site.nil?
$current_site = Site.new
$current_site.name = 'New Site'
$current_site.url = url
$current_site.save
end
end
If you don't mind adding extra gems to your application, I would recommend using apartment. This was done exactly to serve your purpose : handle rails application with multiple tenants.
This would present two advantages to your problem:
handling #current_tenant (or whatever you wish to name it) is done through a middleware, so you won't need to set it in your ApplicationController. Take a look at the Switch on domain section of the README to see how it is done. (Note: apartment uses Apartment.current_tenant to refer to your #current_tenant.
The best part: in most case, you will not need #current_tenant anymore since apartment will scope all requests on the appropriate postgresql schema.
I'm diving into Rails 4 and I'm trying to understand how to safely access model data while it's being accessed by multiple DB connections. I have some match-making logic that finds the oldest user in a queue, removes that user from the queue, and returns the user...
# UserQueue.rb
class UserQueue < ActiveRecord::Base
has_many :users
def match_user(user)
match = nil
if self.users.count > 0
oldest = self.users.oldest_in_queue
if oldest.id != user.id
self.users.delete(oldest)
match = oldest
end
end
end
end
If two different threads executed this match_user method around the same time, is it possible for them to both find the same oldest user and try to delete it from the queue, and return it to the caller? If so, how do I prevent that?
I looked into transactions, but they don't seem to be a solution since there's only one model being modified in this case (the queue).
Thanks in advance for your wisdom!
ActiveRecord has support for row locking.
This is taken from the Rails guide, locking records for update:
11.1 Optimistic Locking
Optimistic locking allows multiple users to access the same record for edits, and assumes a minimum of conflicts with the data. It does this by checking whether another process has made changes to a record since it was opened. An ActiveRecord::StaleObjectError exception is thrown if that has occurred and the update is ignored.
Optimistic locking column
In order to use optimistic locking, the table needs to have a column called lock_version of type integer. Each time the record is updated, Active Record increments the lock_version column. If an update request is made with a lower value in the lock_version field than is currently in the lock_version column in the database, the update request will fail with an ActiveRecord::StaleObjectError. Example:
c1 = Client.find(1)
c2 = Client.find(1)
c1.first_name = "Michael"
c1.save
c2.name = "should fail"
c2.save # Raises an ActiveRecord::StaleObjectError
You're then responsible for dealing with the conflict by rescuing the exception and either rolling back, merging, or otherwise apply the business logic needed to resolve the conflict.
This behavior can be turned off by setting ActiveRecord::Base.lock_optimistically = false.
To override the name of the lock_version column, ActiveRecord::Base provides a class attribute called locking_column:
class Client < ActiveRecord::Base
self.locking_column = :lock_client_column
end
I suggest reading this section in the Rails guide.
Yes, that absolutely could happen. How you prevent it depends on the rest of your app/framework/database/etc.
Transactions won't help because two clients could start the request at the same time and both would see the same UserQueue record as oldest.
You want a mutex. But a mutex in the code isn't ideal if there are other ways to modify the data (ie. directly via SQL, etc.) It can also get messy as the first time you forget to use the mutex, you've opened up the race condition. Then again, maybe this is enough. Just remember your mutex needs to work across threads and processes, etc.
You might see if your database has a mutex or other row level lock you can use to mark the oldest queue record, then extract it.
Or find another way to grab the oldest queue that avoids the race condition entirely. Something like:
SQL to update the oldest queue whose id is not user.id and set "marked_for_work" to some unique ID.
Fetch the queue row whose marked_for_work value is our unique ID.
You could run multiple threads with the above without worry as the SQL update is (well, it should be!) atomic.
Is it possible to force ActiveRecord to push/flush a transaction (or just a save/create)?
I have a clock worker that creates tasks in the background for several task workers. The problem is, the clock worker will sometimes create a task and push it to a task worker before the clock worker information has been fully flushed to the db which causes an ugly race condition.
Using after_commit isn't really viable due to the architecture of the product and how the tasks are generated.
So in short, I need to be able to have one worker create a task and flush that task to the db.
ActiveRecord uses #transaction to create a block that begins and either rolls back or commits a transaction. I believe that would help your issue. Essentially (presuming Task is an ActiveRecord class):
Task.transaction do
new_task = Task.create(...)
end
BackgroundQueue.enqueue(new_task)
You could also go directly to the #connection underneath with:
Task.connection.commit_db_transaction
That's a bit low-level, though, and you have to be pretty confident about the way the code is being used. #after_commit is the best answer, even if it takes a little rejiggering of the code to make it work. If it won't work for certain, then these two approaches should help.
execute uses async_exec under the hood which may or may not be what you want. You could try using the lower level methods execute_and_clear (or even exec_no_cache) instead.
I'd like to be able to "reserve" an element similar to how an airplane seat is locked for a short period of time before it's actually paid for. I think the best way is to do it through the database and preferably at the ORM layer.
Here's an example:
ActiveRecord::Base.transaction do
bar = Bar.find(1, :lock => true)
# do my stuff
end
I need a more flexible solution though.
Here's how I am imagining it to work conceptually:
# action1:
# put an expiring lock (30s) on an element (don't block unrelated code)
# other code
# action2 (after payment):
# come back to the locked element to claim ownership of it
UPDATE: Anyone trying to do this in Rails should try using built-in optimistic locking functionality first.
Add an additional column locked_until - but beware of concurrency. I'd probably do that down on the db layer.
I could have a separate table specifically for this purpose called potential_owner. It would have a timestamp, so that one can figure out the timing. Basically it would work something like that:
# lock the table
# check latest record to see if the element is available
# add a new record or die
This is pretty simple to implement, however locking is not fine-grained. The table describes potential ownership of different elements, and a simple check locks down the whole table. In Tass's solution only the row for a particular element is locked.