Rails Random Active Record with Pagination - ruby-on-rails

I need to find all records for a particular resource and display them in a random order, but with consistent pagination (you won't see the same record twice if you start paging). The display order should be randomized each time a user visits a page. I am using will_paginate. Any advice?

Store a random number in the user session cookies, then use that as seed for your database random function. This will be the same until the user closes their browser, and thus they will all see random, consistent records:
Get a large, random number:
cookies[:seed] = SecureRandom.random_number.to_s[2..20].to_i
Use this seed with e.g. MySQL:
SomeModel.all.order("RAND ?", cookies[:seed])

This is not standard to my knowledge. I can see a use for this for instance for online tests.
I would suggest using a list per session/user. So when a user first goes to the page, you determine a list of ID's, in a random order, and all consecutive views you will use this list to show the correct order for that user/session.
I hope that the amount of rows is limited, and then this would make sense, for instance for tests. Also, when a user would leave a test before finishing it completely, she could continue where he left off. But maybe that is not relevant for you.
Hope this helps.

If you're using a database such as MySQL that has a randomize function such as RAND(), you can just add that to your pagination query like so:
Resource.paginate( ... :order => "RAND()" ... )
Check out some of the comments here regarding performance concerns: https://rails.lighthouseapp.com/projects/8994/tickets/1274-patch-add-support-for-order-random-in-queries

Not sure if you still need help with this. One solution I've done in the past is to do the query with RAND but without pagination at first. Then store those record ID's and use that stored list to lookup and paginate from there. The initial RAND query could be set to only run when the page is 1 or nil. Just a thought.

I ended-up with this solution that worked for me on Postgres:
session[:seed] ||= rand() # should be between [-1..1]
seed = session[:seed]
Product.select("setseed(#{seed})").first
Product.order('random()').limit(10).offset(params[:offset])

Related

Randomize Selections in a List of 100

This is a follow-up to this last question I asked: Sort Users by Number of Followers. That code is:
#ordered_users = User.all.sort{|a,b| b.followers.count <=> a.followers.count}
What I hope to accomplish is take the ordered users and get the top 100 of those and then randomly choose 5 out of that 100. Is there a way to accomplish this?
Thanks.
users_in_descending_order_of_followers = User.all.sort_by { |u| -u.followers.count }
sample_of_top = users_in_descending_order_of_followers.take(100).sample(5)
You can use sort_by which can be easier to use than sort, and combine take and sample to get the top 100 users and sample 5 of those users.
User.all.sort can "potentially" pose some problems in the long-run, depending on the number of total users, and the availability of resources particularly computer memory, not to mention it would be a lot slower because you're calling 2x .followers.count inside the sort block, which essentially calls 2xN times more DB query; N being the number of users. This is because User.all.sort will immediately execute the User.all query, thereby fetching all User records into memory, as opposed to your usual User.all, which is lazy loaded, until you (for example use .each, or better yet .find_each somewhere down the line)
I suggest something like below (I extended Deekshith's answer referring to your link to the other question):
User.joins(:followers).order('count(followers.user_id) desc').limit(100).sample(5)
.joins, .order, and .limit above will all extend the SQL string query into one string, then executes that SQL string, and finally run .sample(5) (not a SQL anymore!, but is already just a plain ruby method at this point), finally yielding the result that you needed.
I would strongly consider using a counter cache on the User model, to hold the count of followers.
This would give a very small performance impact on adding or removing followers, and greatly increase performance when performing sorts:
User.order(followers_count: :desc)
This would be particularly noticeable if you wanted the top-n users by follower count, or finding users with no followers.
User.order(followers_count: :desc).limit(100).sample(5)
This method will out-perform others using count(*). Add an index on followers_count for best effect.

Search a relation without a second query

My question is about how to perform varying levels of search into a database while limiting the number of queries.
Let's start simple:
#companies = Company.where("active = ?", true)
Let's say we display records from this set. Then, we need:
#clientcompanies = #companies.where("client_id = ?", #client.id)
We display something from #clientcompanies. Then, we want to drill down further.
#searchcompanies = #clientcompanies.where("name LIKE ? OR notes LIKE ?", "#{params[:search]}%", "#{params[:search]}%")
Are these three statements the most efficient way to go about this?
If indeed the database is starting with the entire Company table each time around, is there a way to limit the scope so each of the above statements would take a shorter amount of time as the size of the set diminishes?
In case it matters, I'm running Rails 3 on both MySQL and PostgreSQL.
It doesn't get much more optimized then what you're already doing. Exactly zero of those statements will execute a SQL query until you try to iterate over the results. Calling methods like all, first, inspect, any?, each etc will be when the query is executed.
Each time you chain on a new where or other arel method, it appends to the sql query that it'll execute at the end. If, somewhere in the middle, you want to see the query that'll be executed you can do puts #searchcompanies.to_sql
Note that if you run these commands in the console each statement appears to run a SQL query only because the console automatically runs .inspect on the line you entered.
Hopefully I answered your question :)
There's a great railscast here: http://railscasts.com/episodes/239-activerecord-relation-walkthrough that explains how ActiveRelation works, and what you can do with it.
EDIT:
I may have mis-understood your question. You indicated that after each where call you were displaying information from the query. What's the use-case for this? Are you displaying all companies on the same page that you have filtered-out companies from a search? If you display something from that very first query then you will be pulling every single company row from your database (which is not going to be very scalable or performant at larger quantities of company entries).
Would it not make sense to only display information from the #searchcompanies variable?

Updating several records at once in rails

In a rails 2 app I'm building, I have a need to update a collection of records with specific attributes. I have a named scope to find the collection, but I have to iterate over each record to update the attributes. Instead of making one query to update several thousand records, I'll have to make several thousand queries.
What I've found so far is something like Model.find_by_sql("UPDATE products ...)
This feels really junior, but I've googled and looked around SO and haven't found my answer.
For clarity, what I have is:
ps = Product.last_day_of_freshness
ps.each { |p| p.update_attributes(:stale => true) }
What I want is:
Product.last_day_of_freshness.update_attributes(:stale => true)
It sounds like you are looking for ActiveRecord::Base.update_all - from the documentation:
Updates all records with details given if they match a set of conditions supplied, limits and order can also be supplied. This method constructs a single SQL UPDATE statement and sends it straight to the database. It does not instantiate the involved models and it does not trigger Active Record callbacks or validations.
Product.last_day_of_freshness.update_all(:stale => true)
Actually, since this is rails 2.x (You didn't specify) - the named_scope chaining may not work, you might need to pass the conditions for your named scope as the second parameter to update_all instead of chaining it onto the end of the Product scope.
Have you tried using update_all ?
http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-update_all
For those who will need to update big amount of records, one million or even more, there is a good way to update records by batches.
product_ids = Product.last_day_of_freshness.pluck(:id)
iterations_size = product_ids.count / 5000
puts "Products to update #{product_ids.count}"
product_ids.each_slice(5000).with_index do |batch_ids, i|
puts "step #{i} of iterations_size"
Product.where(id: batch_ids).update_all(stale: true)
end
If your table has a lot indexes, it also will increase time for such operations, because it will need to rebuild them. When I called update_all for all records in table, there were about two million records and twelve indexes, operation didn't accomplish in more than one hour. With this approach it took about 20 minutes in development env and about 4 minutes in production, of course it depends on application settings and server hardware. You can put it in rake task or some background worker.
Loos like update_all is the best option... though I'll maintain my hacky version in case you're curious:
You can use just plain-ole SQL to do what you want thus:
ps = Product.last_day_of_freshness
ps_ids = ps.map(%:id).join(',') # local var just for readability
Product.connection.execute("UPDATE `products` SET `stale` = TRUE WHERE id in (#{ps_ids)")
Note that this is db-dependent - you may need to adjust quoting style to suit.

Display a record sequentially with every refresh

I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.

Ruby rand() cannot accept variables?

I'm a little baffled by this.
My end goal in a RoR project is to grab a single random profile from my database.
I was thinking it would be something like:
#profile = Profile.find_by_user_id(rand(User.count))
It kept throwing an error because user_id 0 doesn't exist, so I pulled parts of it out just to check out what's going on:
#r = rand(User.count)
<%= #r %>
This returns 0 every time. So what's going on? I registered 5 fake users and 5 related profiles to test this.
If I take Profile.find_by_user_id(rand(User.count)) and rewrite it as
Profile.find_by_user_id(3)
it works just fine.
User.count is working too. So I think that rand() can't take an input other than a static integer.
Am I right? Whats going on?
Try:
Profile.first(:offset => rand(Profile.count))
As a database ages, especially one with user records, there will be gaps in your ID field sequence. Trying to grab an ID at random will have a potential to fail because you might try to randomly grab one that was deleted.
Instead, if you count the number of records, then randomly go to some offset into the table you will sidestep the possibility of having missing IDs, and only be landing on existing records.
The following example from the OPs question could run into some problems unless the integrity of the database is watched very carefully:
profile = Profile.find_by_user_id(rand(User.count))
The problem is, there's a possibility for the User table to be out of sync with the Profile table, allowing them to have different numbers of records. For instance, if User.count was 3 and there were two records in Profile there's potential for a failed lookup resulting in an exception.
I'm not sure why rand(i) isn't working as you expect (it works fine for me), but this isn't a good way to find a random profile regardless; if a profile is ever deleted, or there are any users without profiles, then this will fail.
I don't think there's an efficient way to do this in Rails using ActiveRecord. For a small number of users, you could just do Profile.find_all() and select a random profile from that array, but you'd probably be better off doing something like
#profile = Profile.find_by_sql("SELECT * FROM profiles ORDER BY RAND() LIMIT 1").first
There are many other questions on StackOverflow about how to select a random record in SQL; I'd say this is the easiest, but if you're concerned about efficiency then have a look around and see if there's another implementation you like better.
EDIT: find_by_sql returns an array, so you need to do .first to get a single Profile.
When I want to get a random record in Rails, I go something like this:
#profile = Profile.first(:order => "RAND()")
This usually works, but from what I've read earlier, the RAND() command is specific to MySQL or at least not database independent. Others might use RANDOM().
Finding By ID IN Rails is Redundant
#profile = Profile.find_by_user_id(rand(User.count))
#This is redudent, all you need to code is:
#profile = Profile.find(rand(User.count)) #default for Rails is ID
The error message based on 0 is probably because of RAILS Conventions Over Configuration sensible defaults which are just rules people agree upon.
And when using rails there is no reason to have a user as 0, it always starts at 1 which I attribute to DHH trying to be more readable.

Resources