I'm a little baffled by this.
My end goal in a RoR project is to grab a single random profile from my database.
I was thinking it would be something like:
#profile = Profile.find_by_user_id(rand(User.count))
It kept throwing an error because user_id 0 doesn't exist, so I pulled parts of it out just to check out what's going on:
#r = rand(User.count)
<%= #r %>
This returns 0 every time. So what's going on? I registered 5 fake users and 5 related profiles to test this.
If I take Profile.find_by_user_id(rand(User.count)) and rewrite it as
Profile.find_by_user_id(3)
it works just fine.
User.count is working too. So I think that rand() can't take an input other than a static integer.
Am I right? Whats going on?
Try:
Profile.first(:offset => rand(Profile.count))
As a database ages, especially one with user records, there will be gaps in your ID field sequence. Trying to grab an ID at random will have a potential to fail because you might try to randomly grab one that was deleted.
Instead, if you count the number of records, then randomly go to some offset into the table you will sidestep the possibility of having missing IDs, and only be landing on existing records.
The following example from the OPs question could run into some problems unless the integrity of the database is watched very carefully:
profile = Profile.find_by_user_id(rand(User.count))
The problem is, there's a possibility for the User table to be out of sync with the Profile table, allowing them to have different numbers of records. For instance, if User.count was 3 and there were two records in Profile there's potential for a failed lookup resulting in an exception.
I'm not sure why rand(i) isn't working as you expect (it works fine for me), but this isn't a good way to find a random profile regardless; if a profile is ever deleted, or there are any users without profiles, then this will fail.
I don't think there's an efficient way to do this in Rails using ActiveRecord. For a small number of users, you could just do Profile.find_all() and select a random profile from that array, but you'd probably be better off doing something like
#profile = Profile.find_by_sql("SELECT * FROM profiles ORDER BY RAND() LIMIT 1").first
There are many other questions on StackOverflow about how to select a random record in SQL; I'd say this is the easiest, but if you're concerned about efficiency then have a look around and see if there's another implementation you like better.
EDIT: find_by_sql returns an array, so you need to do .first to get a single Profile.
When I want to get a random record in Rails, I go something like this:
#profile = Profile.first(:order => "RAND()")
This usually works, but from what I've read earlier, the RAND() command is specific to MySQL or at least not database independent. Others might use RANDOM().
Finding By ID IN Rails is Redundant
#profile = Profile.find_by_user_id(rand(User.count))
#This is redudent, all you need to code is:
#profile = Profile.find(rand(User.count)) #default for Rails is ID
The error message based on 0 is probably because of RAILS Conventions Over Configuration sensible defaults which are just rules people agree upon.
And when using rails there is no reason to have a user as 0, it always starts at 1 which I attribute to DHH trying to be more readable.
Related
My first StackOverflow question, so pardon if it's a little rough around the edges. I recently began my first engineering job and have inherited some legacy Ruby on Rails code to work through.
My goal:
is to fetch posts (this is a model, though with no association to user) belonging to a user, as seen below. The posts should be filtered to only include an end_date that is nullor in the future
The problem:
The ActiveRecord query #valid_posts ||= Posts.for_user(user).where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil).pluck(post_set_id) (some further context below)
generates ~15 calls to my database per user per second when testing with Postman, causing significant memory spikes, notably with increased # of posts. I would only expect (not sure about this) 2 at most (one to fetch posts for the user, a second to fetch posts that match the date constraint).
In absence of the .where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil), there are no memory issues whatsoever. My question essentially, is why does this particular line cause so many queries to the database (which seems to be the cause of memory spikes), and what would be an improved implementation?
My reasoning thus far:
My initial suspicion was that I was making an N+1 query, though I no longer believe this to be the case (compared .select with .where in the query, no significant changes. A third option would possibly be to use .includes, though there is no association between a user and a post, and I do not believe that it would be feasible to generate one, as to my level of understanding, users are a function of an organization, not their own model.
My second thought is that because I am using a date that is precise to the millisecond, the time is ever changing, and therefore the updated time runs against the posts table every time there is a change in time (in this case every millisecond). Would it be possible to capture the current time in a variable and then pass this to the .where statement, rather than with a varying time, as is currently implemented? This would ultimately cause a sort of caching mechanism if I am not mistaken.
My third thought was to add an index to end_date on the posts table for quicker lookup, though in itself, I do not believe this to provide a solution.
Some basic context:
While there are many files working together, I have tried to overly-simplify them to essentially reflect the information that I believe is necessary to understand the issue at hand. If there is no identifiable cause for this issue, then perhaps I need to dig into other areas of code.
for_user is a user scope defined below:
user_scope
module UserScopable
extend ActiveSupport::Concern
...
scope(:for_user,
lambda { |poster|
for_user_scope(
{ user_id: poster.user_id, organization_id: poster.organization_id}
)
})
scope(:for_user_scope, lambda { |hash|
where(user_id: hash.fetch(:user_id), organization_id: hash.fetch(:organization_id))
})
#valid_posts is contained within a module, PostSetFilter and called in the user controller:
users_controller
def post_ids
post_pools = PostSetFilter.new(user: user)
render json: {PostPools: post_pools}
end
Ultimately, there's a lot that I do not know, and it seems like many approaches, so not entirely sure how to proceed. Any guidance about how to reduce the number of queries, and any reasoning as to why would be greatly appreciated.
I am happy to provide further context if needed, though everything points to the aforementioned line as being the culprit.. Thank you in advance.
I have a private instance user model method like below:-
def generate_authentication_token
loop do
token = Devise.friendly_token
break token unless self.class.unscoped.where(authentication_token: token).first
end
end
Let us assume that, no. of user record is 100 trillions. If I issue the above script and the token generated on each iteration matches with a record then, loop will iterate for 100 trillion times, same will happen with DBMS too for searching record. Is there any solution to make this problem fast, by reducing iteration and reducing db-hits (given that each iteration will give matching record). And seriously, sorry ! if the question makes no sense and let me know, I will delete ASAP. Thanks. Happy Coding.
The code looks fine. find_by(authentication_token: token) is generally preferred for retrieving a single record.
Make sure you add an index for the authentication_token column. That would drastically speed up the query if you have many records. If you don't know about database indexing, see this answer.
I wouldn't worry about optimizing the number of iterations or db hits, as the chances that you will find a matching token even on the first try are slim.
Why if you use something like this?
token = Digest::MD5.hexdigest (rand.to_s << Time.now.to_i.to_s)
Here I am creating a random number and appending the Current Time stamp with it and getting MD5 hash for that string value. That will be pretty good secret token and it will be unique.
If you still are not satisfied with the uniqueness criteria then just append the ID of the user at the start.
There is no query involved in it
You are looking to create a random yet unique token. Sounds like a UUID to me. You can try https://github.com/assaf/uuid
Your approach will work but will be inefficient for the 100 trillion users you are aiming at:-)
I have a Rails app with PostgreSQL.
I'm trying to implement a method to suggest alternative names for a certain resource, if the user input has been already chosen.
My reference is slack:
Is there any solution that could do this efficiently?
For efficiently I mean: using only one or also a small set of queries. A pure SQL solution would be great, though.
My initial implementation looked like this:
def generate_alternative_names(model, column_name, count)
words = model[column_name].split(/[,\s\-_]+/).reject(&:blank?)
candidates = 100.times.map! { |i| generate_candidates_using_a_certain_strategy(i, words) }
already_used = model.class.where(column_name => candidates).pluck(column_name)
(candidates - already_used).first(count)
end
# Usage example:
model = Domain.new
model.name = 'hello-world'
generate_alternative_names(model, :name, 5)
# => ["hello_world", "hello-world2", "world_hello", ...]
It generates 100 candidates, then checks the database for matches and removes them from the candidates list. Finally it returns the first count values extracted.
This method is a best effort implementation, as it works for small sets of suggestions, that have few conflicts (in my case, 100 conflicts).
Even if I increase this magic number (100), it does not scales indefinitely.
Do you know a method to improve this, so it can scale for large number of conflicts and without using magic numbers?
I would go with reversed approach: query the database for existing records using LIKE and then generate suggestions skipping already taken:
def alternatives(model, column, word, count)
taken = model.class.where("#{column} LIKE '%#{word}%'").pluck(column)
count.times.map! do |i|
generate_candidates_using_a_certain_strategy(i, taken)
end
end
Make a generate_candidates_using_a_certain_strategy to receive an array of already taken words to be skipped. There could be one possible glitch with race condition on two requests taking the same name, but I don’t think it might cause any problems, since you are always free to apologize when an actual creation will fail.
I've been going round in circles for a few days trying to solve a problem which I've also struggled with in the past. Essentially its an issue of understanding the best (or an efficient) way to perform multiple queries on a model as I'm regularly finding my pages are very slow to load.
Consider the situation where you have a model called Everything. Initially you perform a query which finds those records in Everything which match certain criteria
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC')
I want to remember the contents of #chosenrecords as I will present them to the user as a list, however, I would also like to understand more of the attributes of #chosenrecords,for instance
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.count
When I use the above code in my controller and inspect the command history on the local server, I am surprised to find that each of the three queries involves an SQL query on the original Everything model, rather than remembering the records returned in #chosenrecords and performing the query on that. This seems very inefficient to me and indeed each of the three queries takes the same amount of time to process, making the page perform slowly.
I am more experienced in writing codes in software like MATLAB where once you've calculated the value of a variable it is stored locally and can be quickly interrogated, rather than recalculating that variable on each occasion you want to know more information about it. Please could you guide me as to whether I am just on the wrong track completely and the issues I've identified are just "how it is in Rails" or whether there is something I can do to improve it. I've looked into concepts like using a scope, defining a different variable type, and caching, but I'm not quite sure what I'm doing in each case and keep ending up in a similar hole.
Thanks for your time
You are partially on the wrong track. Rails 3 comes with Arel, which defer the query until data is required. In your case, you have generated Arel query but executing it with .first & then with .count. What I have done here is run the first query, get all the results in an array and working on that array in next two lines.
Perform the queries like this:-
#chosenrecords = Everything.where('name LIKE ?', 'What I want').order('price ASC').all
#minimumprice = #chosenrecords.first
#numberofrecords = #chosenrecords.size
It will solve your issue.
I need to find all records for a particular resource and display them in a random order, but with consistent pagination (you won't see the same record twice if you start paging). The display order should be randomized each time a user visits a page. I am using will_paginate. Any advice?
Store a random number in the user session cookies, then use that as seed for your database random function. This will be the same until the user closes their browser, and thus they will all see random, consistent records:
Get a large, random number:
cookies[:seed] = SecureRandom.random_number.to_s[2..20].to_i
Use this seed with e.g. MySQL:
SomeModel.all.order("RAND ?", cookies[:seed])
This is not standard to my knowledge. I can see a use for this for instance for online tests.
I would suggest using a list per session/user. So when a user first goes to the page, you determine a list of ID's, in a random order, and all consecutive views you will use this list to show the correct order for that user/session.
I hope that the amount of rows is limited, and then this would make sense, for instance for tests. Also, when a user would leave a test before finishing it completely, she could continue where he left off. But maybe that is not relevant for you.
Hope this helps.
If you're using a database such as MySQL that has a randomize function such as RAND(), you can just add that to your pagination query like so:
Resource.paginate( ... :order => "RAND()" ... )
Check out some of the comments here regarding performance concerns: https://rails.lighthouseapp.com/projects/8994/tickets/1274-patch-add-support-for-order-random-in-queries
Not sure if you still need help with this. One solution I've done in the past is to do the query with RAND but without pagination at first. Then store those record ID's and use that stored list to lookup and paginate from there. The initial RAND query could be set to only run when the page is 1 or nil. Just a thought.
I ended-up with this solution that worked for me on Postgres:
session[:seed] ||= rand() # should be between [-1..1]
seed = session[:seed]
Product.select("setseed(#{seed})").first
Product.order('random()').limit(10).offset(params[:offset])