When a user is created, Authlogic sets a persistence token which it uses for session maintenance. As part of this process, it does a query like:
User Exists (517.6ms) SELECT 1 AS one FROM 'users' WHERE 'users'.'persistence_token' = BINARY 'xyz123123' AND 'users'.'deleted_at' IS NULL LIMIT 1
which as you can see is pretty expensive on our database. This code is evidenced here: https://github.com/binarylogic/authlogic/blob/4f03d6520d8b13394023f5cbc9ba74ab1464b89d/lib/authlogic/session/session.rb
This is particularly problematic with our user sync function which creates thousands of users and thus this query is very very expensive.
How can we disable this behavior during this operation and set it later? I tried to use user.save_without_session_maintenance but that didn’t seem to work for me.
Related
What I'm going for is URL's very similar to Basecamp's:
https://3.basecamp.com/4000818/buckets/7452203/message_boards/1039416768
I have already achieved this functionality by following this guide, but I am unsatisfied with the process of needing to run potentially millions of .exists? lookups to find an open number and fear this will very quickly hamper performance of my app.
def set_hash_id
hash_id = nil
loop do
hash_id = SecureRandom.urlsafe_base64(9).gsub(/-|_/,('a'..'z').to_a[rand(26)])
break unless self.class.name.constantize.where(:hash_id => hash_id).exists?
end
self.hash_id = hash_id
end
I find it hard to believe that Basecamp is relying on something so inefficient on every record save and I'm looking to find out how they do it or to find a setup that will look the same but without the overhead of the linked tutorial.
I'd appreciate any input on methods to generate a non-sequential record ID. I am not interested in UUID's as I can't stand the non-pleasing URL's they generate. Also, they must be integers. Basically, exactly like the Basecamp URL but without the overhead of the exists? checks. Is it possible they are doing some kind of combination of numbers with an encoded timestamp or something to ensure there is no collisions? I have explored the hashids.org method but this does not generate integer-only hashes.
I am using Postgres as my database, in case this is helpful.
Efficiency-wise I think you should be fine. GitLab also uses something similar for unique token generation.
There's another issue though that's worth considering:
Your method does not guarantee to generate a unique key, as the operation is not atomic (neither is GitLab's). Between checking for uniqueness and writing the record to the database the same key could have been generated.
You have at least 2 options to handle this. Both solution should also be more efficient (which is your main concern).
Catch the DB's unique key constrain violation on save
def save
begin
self.hash_id = generate_hash_id
super
rescue ActiveRecord::RecordNotUnique => e
# 1. you may need to check the message for the specific constraint
# 2. you may need to implement an abort condition to prevent infinite retries
retry
end
end
You could can also do this in an ActiveRecord callback.
Have the DB generate the key
An alternative solution would be to have your database generate the unique key on create. A function like the one described in this blogpost A Better ID Generator For PostgreSQL may serve your purpose better.
This solution has the advantage that your application code does not need to be concerned about generating or catching collisions. The drawback is though that this solution is DB specific.
I have some code like this:
u = ... some user ...
u.clubs << Club.new(:name => "Stu's house of Disco")
With a join model, ClubMemberships that is created automatically via the above.
So far, so good. However, users have an attribute that is kept in memory, not in the database, for security reasons, and when the "<<" method fires, it reloads the user from the database, and thus blows up some code in ClubMemberships#after_create that depends on the user having its secret decoder ring intact, which it does not have when freshly loaded from the database. This seems a bit strange: why is it loading the user when we have a perfectly good one sitting right there? More importantly, is there a way to work around this, or are we going to have to simply create our own add_club method for the user?
Here's what we did. It's an ugly hack, and superior solutions would be welcome.
We created a ##secret_decoder_ring class variable in User with a hash linking the object's ID and the secret, non-DB piece of information, so that if the object in question has been seen and had the secret data filled in, it will be visible in the future.
In reality, we only want the visibility to be the duration of a request processed in Rails, so we have an after_filter that wipes the data in question.
Ugly? Yes, but it does the job for the time being.
In my rails application, I have some code like this:
def foo
if object_bar_exists
raise "can't create bar twice!"
end
Bar.create
end
Which could be invoked by two different requests coming into the application server. If this code is run by two requests simultaneously, and they both run the if check at the same time, neither will find the other's bar, and 2 bars will be created.
What's the best way to create a "mutex" for "the collection of bars"? A special purpose mutex table in the DB?
update
I should emphasize that I cannot use a memory mutex here, because the concurrency is across requests/processes and not threads.
The best thing to do is perform your operations in a DB transaction. Because you will probably eventually have multiple applications running and they very possibly won't share memory, you won't be able to create a Mutex lock on the application level, especially if those two application services are running on entirely different physical boxes. Here's how to accomplish the DB transaction:
ActiveRecord::Base.transaction do
# Transaction code goes here.
end
If you want to ensure a rollback on the DB transaction then you'll have to have validation enabled on the Bar class so that an invalid save request will cause a rollback:
ActiveRecord::Base.transaction do
bar = Bar.new(params[:bar])
bar.save!
end
If you already have a bar object in the DB, you can lock that object pessimistically like this:
ActiveRecord::Base.transaction do
bar = Bar.find(1, :lock => true)
# perform operations on bar
end
If they all the requests are coming into the same machine, and the same ruby virtual machine, you could use Ruby's built in Mutex class: Mutex Docs.
If there are multiple machines or rvms, you will have to use a database transaction to create / get the Bar object, assuming it's stored in the db somehow.
I would probably create a Singleton object in the lib directory, so that you only have one instance of the thing, and use Mutexes to lock on it.
This way, you can ensure only one access to the thing at any given point in time. Of course, any other requests will block on it, so that's something to keep in mind.
For multiple machines, you'd have to store a token in a database, and synchronize some kind of access to the token. Like, the token has to be queried and pulled out, and keep track of some number or something to ensure that people cannot remove the token at the same time. Or use a centralized locking web-service so that your token handling is only in one spot.
I have a Postgres database (9) that I am writing a trigger for. I want the trigger to set the modification time, and user id for a record. In Firebird you have a CONNECTIONID that you can use in a trigger, so you could add a value to a table when you connect to the database (this is a desktop application, so connections are persistent for the lifetime of the app), something like this:
UserId | ConnectionId
---------------------
544 | 3775
and then look up in the trigger that connectionid 3775 belongs to userid 544 and use 544 as the user that modified the record.
Is there anything similar I can use in Postgres?
you could use the process id. It can be retrieved with:
pg_backend_pid()
With this pid you can also use the table pg_stat_activity to get more information about the current backend, althouht you already should know everything, since you are using this backend.
Or better. Just create a serial, and retrieve one value from it for each connection:
CREATE SEQUENCE 'connectionids';
And then:
SELECT next_val('connectionids');
in each connection, to retrieve a connection unique id.
One way is to use the custom_variable_classes configuration option. It appears to be designed to allow the configuration of add-on modules, but can also be used to store arbitrary values in the current database session.
Something along the lines of the following needs to be added to postgresql.conf:
custom_variable_classes = 'local'
When you first connect to the database you can store whatever information you require in the custom class, like so:
SET local.userid = 'foobar';
And later in on you can retrieve this value with the current_setting() function:
SELECT current_setting('local.userid');
Adding an entry to a log table might look something like this:
INSERT INTO audit_log VALUES (now(), current_setting('local.userid'), ...)
While it may work for your desktop use case, note that process ID numbers do rollover (32768 is a common upper limit), so using them as a unique key to identify a user can run into problems. If you ever end up with leftover data from a previous session in the table that's tracking user->process mapping, that can collide with newer connections assigned the same process id once it's rolled over. It may be sufficient for your app to just make sure you aggressively clean out old mapping entries, perhaps at startup time given how you've described its operation.
To avoid this problem in general, you need to make a connection key that includes an additional bit of information, such as when the session started:
SELECT procpid,backend_start FROM pg_stat_activity WHERE procpid=pg_backend_pid();
That has to iterate over all of the connections active at the time to compute, so it does add a bit of overhead. It's possible to execute that a bit more efficiently starting in PostgreSQL 8.4:
SELECT procpid,backend_start FROM pg_stat_get_activity(pg_backend_pid());
But that only really matters if you have a large number of connections active at once.
Use current_user if you need the database user (I'm not sure that's what you want by reading your question).
I have a DateTime LastSeen property that stores in the database when the user was last seen.
I have 1 way in mind when to update the database is to do it when validating the user during logging in.
Another way is if I'm going to update the database every 20 minutes, where do I put this logic in asp.net mvc? Do I need to set a lastupdate in the cookie and check that? Where would I check this cookie other than in the global.ascx. file?
How do other systems do it?
Personally, I would take a page out of google analytics' book and run this client side. To get there:
a) Setup a Handler/Action/something that takes http requests to handle recording user "seen" activities
b) Setup an ajax call to (a) to record activities at a reasonable interval from the client.
This will let you get to a much better answer to the question "what if bob just opened the site, saw he didn't have any messages and went on browsing [whatever]"
I think as you suggest, update that value when the user logs on would be simplest.
If you model also has CreatedOn, CreatedBy, ModifiedOn, ModifiedBy properties you can also query these values with a join onto the user table to see if they have been active elsewhere in the app but this may not be great in performance as you'll need a join on every table in your database.