I have a private instance user model method like below:-
def generate_authentication_token
loop do
token = Devise.friendly_token
break token unless self.class.unscoped.where(authentication_token: token).first
end
end
Let us assume that, no. of user record is 100 trillions. If I issue the above script and the token generated on each iteration matches with a record then, loop will iterate for 100 trillion times, same will happen with DBMS too for searching record. Is there any solution to make this problem fast, by reducing iteration and reducing db-hits (given that each iteration will give matching record). And seriously, sorry ! if the question makes no sense and let me know, I will delete ASAP. Thanks. Happy Coding.
The code looks fine. find_by(authentication_token: token) is generally preferred for retrieving a single record.
Make sure you add an index for the authentication_token column. That would drastically speed up the query if you have many records. If you don't know about database indexing, see this answer.
I wouldn't worry about optimizing the number of iterations or db hits, as the chances that you will find a matching token even on the first try are slim.
Why if you use something like this?
token = Digest::MD5.hexdigest (rand.to_s << Time.now.to_i.to_s)
Here I am creating a random number and appending the Current Time stamp with it and getting MD5 hash for that string value. That will be pretty good secret token and it will be unique.
If you still are not satisfied with the uniqueness criteria then just append the ID of the user at the start.
There is no query involved in it
You are looking to create a random yet unique token. Sounds like a UUID to me. You can try https://github.com/assaf/uuid
Your approach will work but will be inefficient for the 100 trillion users you are aiming at:-)
Related
This is a follow-up to this last question I asked: Sort Users by Number of Followers. That code is:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
What I hope to accomplish is take the ordered users and get the top 100 of those and then randomly choose 5 out of that 100. Is there a way to accomplish this?
Thanks.
users_in_descending_order_of_followers = User.all.sort_by { |u| -u.followers.count }
sample_of_top = users_in_descending_order_of_followers.take(100).sample(5)
You can use sort_by which can be easier to use than sort, and combine take and sample to get the top 100 users and sample 5 of those users.
User.all.sort can "potentially" pose some problems in the long-run, depending on the number of total users, and the availability of resources particularly computer memory, not to mention it would be a lot slower because you're calling 2x .followers.count inside the sort block, which essentially calls 2xN times more DB query; N being the number of users. This is because User.all.sort will immediately execute the User.all query, thereby fetching all User records into memory, as opposed to your usual User.all, which is lazy loaded, until you (for example use .each, or better yet .find_each somewhere down the line)
I suggest something like below (I extended Deekshith's answer referring to your link to the other question):
User.joins(:followers).order('count(followers.user_id) desc').limit(100).sample(5)
.joins, .order, and .limit above will all extend the SQL string query into one string, then executes that SQL string, and finally run .sample(5) (not a SQL anymore!, but is already just a plain ruby method at this point), finally yielding the result that you needed.
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.
User.order(followers_count: :desc).limit(100).sample(5)
This method will out-perform others using count(*). Add an index on followers_count for best effect.
Internally, my website stores Users in a database indexed by an integer primary key.
However, I'd like to associate Users with a number of unique, difficult-to-guess identifiers that will be each used in various circumstance. Examples:
One for a user profile URL: So a User can be found and displayed by a URL that does not include their actual primary key, preventing the profiles from being scraped.
One for a no-login email unsubscribe form: So a user can change their email preferences by clicking through a link in the email without having to login, preventing other people from being able to easily guess the URL and tamper with their email preferences.
As I see it, the key characteristics I'll need for these identifiers is that they are not easily guessed, that they are unique, and that knowing the key or identifier will not make it easy to find the other.
In light of that, I was thinking about using SecureRandom::urlsafe_base64 to generate multiple random identifiers whenever a new user is created, one for each purpose. As they are random, I would need to do database checks before insertion in order to guarantee uniqueness.
Could anyone provide a sanity check and confirm that this is a reasonable approach?
The method you are using is using a secure random generator, so guessing the next URL even knowing one of them will be hard. When generating random sequences, this is a key aspect to keep in mind: non-secure random generators can become predictable, and having one value can help predict what the next one would be. You are probably OK on this one.
Also, urlsafe_base64 says in its documentation that the default random length is 16 bytes. This gives you 816 different possible values (2.81474977 × 1014). This is not a huge number. For example, it means that a scraper doing 10.000 request a second will be able to try all possible identifiers in about 900 years. It seems acceptable for now, but computers are becoming faster and faster, and depending on the scale of your application this could be a problem in the future. Just making the first parameter bigger can solve this issue though.
Lastly, something that you should definitely consider: the possibility for your database to be leaked. Even if your identifiers are bullet proof, your database might not be and an attacker might be able to get a list of all identifiers. You should definitely hash the identifiers in the database with a secure hashing algorithm (with appropriate salts, the same you would do for a password). Just to give you an idea on how important this is, with a recent GPU, SHA-1 can be brute forced at a rate of 350.000.000 tries per second. A 16 bytes key (the default for the method you are using) hashed using SHA-1 would be guessed in about 9 days.
In summary: the algorithm is good enough, but increase the length of keys and hash them in the database.
Because the generated ids will not be related to any other data, they are going to be very hard (impossible) to guess. To quickly validate there uniqueness and find users, you'll have to index them in the DB.
You'll also need to write a function that returns a unique id checking the uniqueness, something like:
def generate_id(field_name)
found = false
while not found
rnd = SecureRandom.urlsafe_base64
found = User.exists?(field_name: rnd)
end
rnd
end
Last security check, try to check the correspondance between an identifier and the user information before doing any changes, at least the email.
That said, it seems a good approach to me.
I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.
I'm a little baffled by this.
My end goal in a RoR project is to grab a single random profile from my database.
I was thinking it would be something like:
#profile = Profile.find_by_user_id(rand(User.count))
It kept throwing an error because user_id 0 doesn't exist, so I pulled parts of it out just to check out what's going on:
#r = rand(User.count)
<%= #r %>
This returns 0 every time. So what's going on? I registered 5 fake users and 5 related profiles to test this.
If I take Profile.find_by_user_id(rand(User.count)) and rewrite it as
Profile.find_by_user_id(3)
it works just fine.
User.count is working too. So I think that rand() can't take an input other than a static integer.
Am I right? Whats going on?
Try:
Profile.first(:offset => rand(Profile.count))
As a database ages, especially one with user records, there will be gaps in your ID field sequence. Trying to grab an ID at random will have a potential to fail because you might try to randomly grab one that was deleted.
Instead, if you count the number of records, then randomly go to some offset into the table you will sidestep the possibility of having missing IDs, and only be landing on existing records.
The following example from the OPs question could run into some problems unless the integrity of the database is watched very carefully:
profile = Profile.find_by_user_id(rand(User.count))
The problem is, there's a possibility for the User table to be out of sync with the Profile table, allowing them to have different numbers of records. For instance, if User.count was 3 and there were two records in Profile there's potential for a failed lookup resulting in an exception.
I'm not sure why rand(i) isn't working as you expect (it works fine for me), but this isn't a good way to find a random profile regardless; if a profile is ever deleted, or there are any users without profiles, then this will fail.
I don't think there's an efficient way to do this in Rails using ActiveRecord. For a small number of users, you could just do Profile.find_all() and select a random profile from that array, but you'd probably be better off doing something like
#profile = Profile.find_by_sql("SELECT * FROM profiles ORDER BY RAND() LIMIT 1").first
There are many other questions on StackOverflow about how to select a random record in SQL; I'd say this is the easiest, but if you're concerned about efficiency then have a look around and see if there's another implementation you like better.
EDIT: find_by_sql returns an array, so you need to do .first to get a single Profile.
When I want to get a random record in Rails, I go something like this:
#profile = Profile.first(:order => "RAND()")
This usually works, but from what I've read earlier, the RAND() command is specific to MySQL or at least not database independent. Others might use RANDOM().
Finding By ID IN Rails is Redundant
#profile = Profile.find_by_user_id(rand(User.count))
#This is redudent, all you need to code is:
#profile = Profile.find(rand(User.count)) #default for Rails is ID
The error message based on 0 is probably because of RAILS Conventions Over Configuration sensible defaults which are just rules people agree upon.
And when using rails there is no reason to have a user as 0, it always starts at 1 which I attribute to DHH trying to be more readable.
I need to find all records for a particular resource and display them in a random order, but with consistent pagination (you won't see the same record twice if you start paging). The display order should be randomized each time a user visits a page. I am using will_paginate. Any advice?
Store a random number in the user session cookies, then use that as seed for your database random function. This will be the same until the user closes their browser, and thus they will all see random, consistent records:
Get a large, random number:
cookies[:seed] = SecureRandom.random_number.to_s[2..20].to_i
Use this seed with e.g. MySQL:
SomeModel.all.order("RAND ?", cookies[:seed])
This is not standard to my knowledge. I can see a use for this for instance for online tests.
I would suggest using a list per session/user. So when a user first goes to the page, you determine a list of ID's, in a random order, and all consecutive views you will use this list to show the correct order for that user/session.
I hope that the amount of rows is limited, and then this would make sense, for instance for tests. Also, when a user would leave a test before finishing it completely, she could continue where he left off. But maybe that is not relevant for you.
Hope this helps.
If you're using a database such as MySQL that has a randomize function such as RAND(), you can just add that to your pagination query like so:
Resource.paginate( ... :order => "RAND()" ... )
Check out some of the comments here regarding performance concerns: https://rails.lighthouseapp.com/projects/8994/tickets/1274-patch-add-support-for-order-random-in-queries
Not sure if you still need help with this. One solution I've done in the past is to do the query with RAND but without pagination at first. Then store those record ID's and use that stored list to lookup and paginate from there. The initial RAND query could be set to only run when the page is 1 or nil. Just a thought.
I ended-up with this solution that worked for me on Postgres:
session[:seed] ||= rand() # should be between [-1..1]
seed = session[:seed]
Product.select("setseed(#{seed})").first
Product.order('random()').limit(10).offset(params[:offset])