Unique Random Record in Ruby - ruby-on-rails

I have a Question table in which there is id# and title as columns.
Now, I need to randomly select 5 questions from the table. I am seeing people are using:
Question.order("RANDOM()").limit(5) //using postgre
Till now I have:
def selectr
#randquestion=[]
while #randquestion.length<3 do
Question.uncached do
ques=Question.order("RANDOM()").first
#randquestion << ques
end
end
end
I found uncaching from Ruby on Rails Active Record RANDOM() always the same within a loop.
But I am not sure if this will give me unique questions. I want 3 unique questions only.

You can do this by fetching each question via a separate SQL request, with subsequent requests excluding IDs for records already seen. Here's an example, tested with Rails 5.2 and MySQL:
def random_questions(number)
already_seen = []
number.times.map do
question = Question.order('RAND()').where.not(id: already_seen).first
already_seen << question.id
question
end
end
If you try this out with random_questions(3), you'll see:
Question Load (1.5ms) SELECT `questions`.* FROM `questions` WHERE 1=1 ORDER BY RAND() LIMIT 1
Question Load (1.4ms) SELECT `questions`.* FROM `questions` WHERE `questions`.`id` != 2 ORDER BY RAND() LIMIT 1
Question Load (1.3ms) SELECT `questions`.* FROM `questions` WHERE `questions`.`id` NOT IN (2, 1) ORDER BY RAND() LIMIT 1
As an aside, please note that order('RAND()') triggers a deprecation warning in newer versions of Rails:
DEPRECATION WARNING: Dangerous query method (method whose arguments are used as raw SQL) called with non-attribute argument(s): "RAND()". Non-attribute arguments will be disallowed in Rails 6.0. This method should not be called with user-provided values, such as request parameters or model attributes. Known-safe values can be passed by wrapping them in Arel.sql().
To avoid this warning, use .order(Arel.sql('RAND()')) instead.

Related

ERROR: column "pitch_count" does not exist when it exists in an alias

I'm running the following query in Rails 5, with the goal of finding the user with the most Pitches:
User
.select("users.*, COUNT(user_id) as pitch_count")
.unscoped
.joins("LEFT JOIN pitches AS pitches ON pitches.user_id = users.id")
.group("pitch.user_id")
.order("pitch_count DESC")
.limit(5)
But I'm getting the error:
Caused by PG::UndefinedColumn: ERROR: column "pitch_count" does not exist
Why isn't the query orderable by pitch_count?
Problem is in the unscoped method. It removes all previously defined scopes including select statement. See the following example:
User.select(:full_name, :email).unscoped.to_sql
# => SELECT "users".* FROM "users"
User.unscoped.select(:full_name, :email).to_sql
# => SELECT "users"."full_name", "users"."email" FROM "users"
See the difference? unscoped called after select definition completely removed every thing defined in the select.
For you this means that you should modify your code to call unscoped right after the model name:
User
.unscoped
.select("users.*, COUNT(user_id) as pitch_count")
.joins("LEFT JOIN pitches AS pitches ON pitches.user_id = users.id")
.group("pitch.user_id")
.order("pitch_count DESC")
.limit(5)
Note: new lines added mostly for readability but it should work like this in your ruby files. If you want to execute it in the rails console. You will have to remove new lines
Btw. you still might get error that column "user.id" must appear in the GROUP BY clause or be used in an aggregate function. It should be fixed by modifying group statement to use users.id instead of pitch.user_id:
.group("users.id")
I suggest you use counter_cache to make it easy to maintain and good for performance as well. By adding counter cache, you can get the user record with most pitches by User.reorder(pitches_count: :desc).first.

Rails: Why does where(id: objects) work?

I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.
The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC
Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))

DEPRECATION WARNING: Dangerous query method: Random Record in ActiveRecord >= 5.2

So far, the "common" way to get a random record from the Database has been:
# Postgress
Model.order("RANDOM()").first
# MySQL
Model.order("RAND()").first
But, when doing this in Rails 5.2, it shows the following Deprecation Warning:
DEPRECATION WARNING: Dangerous query method (method whose arguments are used as raw SQL) called with non-attribute argument(s): "RANDOM()". Non-attribute arguments will be disallowed in Rails 6.0. This method should not be called with user-provided values, such as request parameters or model attributes. Known-safe values can be passed by wrapping them in Arel.sql().
I am not really familiar with Arel, so I am not sure what would be the correct way to fix this.
If you want to continue using order by random() then just declare it safe by wrapping it in Arel.sql like the deprecation warning suggests:
Model.order(Arel.sql('random()')).first # PostgreSQL
Model.order(Arel.sql('rand()')).first # MySQL
There are lots of ways of selecting a random row and they all have advantages and disadvantages but there are times when you absolutely must use a snippet of SQL in an order by (such as when you need the order to match a Ruby array and have to get a big case when ... end expression down to the database) so using Arel.sql to get around this "attributes only" restriction is a tool we all need to know about.
Edited: The sample code is missing a closing parentheses.
I'm a fan of this solution:
Model.offset(rand(Model.count)).first
With many records, and not many deleted records, this may be more efficient. In my case I have to use .unscoped because default scope uses a join. If your model doesn't use such a default scope, you can omit the .unscoped wherever it appears.
Patient.unscoped.count #=> 134049
class Patient
def self.random
return nil unless Patient.unscoped.any?
until #patient do
#patient = Patient.unscoped.find rand(Patient.unscoped.last.id)
end
#patient
end
end
#Compare with other solutions offered here in my use case
puts Benchmark.measure{10.times{Patient.unscoped.order(Arel.sql('RANDOM()')).first }}
#=>0.010000 0.000000 0.010000 ( 1.222340)
Patient.unscoped.order(Arel.sql('RANDOM()')).first
Patient Load (121.1ms) SELECT "patients".* FROM "patients" ORDER BY RANDOM() LIMIT 1
puts Benchmark.measure {10.times {Patient.unscoped.offset(rand(Patient.unscoped.count)).first }}
#=>0.020000 0.000000 0.020000 ( 0.318977)
Patient.unscoped.offset(rand(Patient.unscoped.count)).first
(11.7ms) SELECT COUNT(*) FROM "patients"
Patient Load (33.4ms) SELECT "patients".* FROM "patients" ORDER BY "patients"."id" ASC LIMIT 1 OFFSET 106284
puts Benchmark.measure{10.times{Patient.random}}
#=>0.010000 0.000000 0.010000 ( 0.148306)
Patient.random
(14.8ms) SELECT COUNT(*) FROM "patients"
#also
Patient.unscoped.find rand(Patient.unscoped.last.id)
Patient Load (0.3ms) SELECT "patients".* FROM "patients" ORDER BY "patients"."id" DESC LIMIT 1
Patient Load (0.4ms) SELECT "patients".* FROM "patients" WHERE "patients"."id" = $1 LIMIT 1 [["id", 4511]]
The reason for this is because we're using rand() to get a random ID and just do a find on that single record. However the greater the number of deleted rows (skipped ids) the more likely the while loop will execute multiple times. It might be overkill but could be worth a the 62% increase in performance and even higher if you never delete rows. Test if it's better for your use case.

Postgresql error with Rails 3 using order("RANDOM()")

Im trying to query my db for records that are similar to the currently viewed record (based on taggings), which I have working but I would like to randomize the order.
my development environment is mysql so I would do something like:
#tattoos = Tattoo.tagged_with(tags, :any => true).order("RAND()").limit(6)
which works, but my production environment is heroku which is using postgresql so I tried using this:
#tattoos = Tattoo.tagged_with(tags, :any => true).order("RANDOM()").limit(6)
but I get the following error:
ActionView::Template::Error (PGError: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
SELECT DISTINCT tattoos.* FROM "tattoos" JOIN taggings
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477 ON
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.taggable_id = tattoos.id AND
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.taggable_type = 'Tattoo' WHERE
(tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 3 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 4 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 5 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 24 OR
tattoos_taggings_color_fantasy_newschool_nerdy_tv_477.tag_id = 205) ORDER BY RANDOM() LIMIT 6):
After analyzing the query more closely, I have to correct my first draft. The query would require a DISTINCT or GROUP BY the way it is.
The (possibly) duplicate tattoos.* come from first joining to (possibly) multiple rows in the table taggings. Your query engine then tries to get rid of such duplicates again by using DISTINCT - in a syntactically illegal way.
DISTINCT basically sorts the resulting rows by the resulting columns from left to right and picks the first for each set of duplicates. That's why the leftmost ORDER BY column have to match the SELECT list.
MySQL is more permissive and allows the non-standard use of DISTINCT, but PostgreSQL throws an error.
ORMs often produce ineffective SQL statements (they are just crutches after all). However, if you use appropriate PostgreSQL libraries, such an illegal statement shouldn't be produced to begin with. I am no Ruby expert, but something's fishy here.
The query is also very ugly and inefficient.
There are several ways to fix it. For instance:
SELECT *
FROM (<query without ORDER BY and LIMIT>) x
ORDER BY RANDOM()
LIMIT 6
Or, better yet, rewrite the query with this faster, cleaner alternative doing the same:
SELECT ta.*
FROM tattoos ta
WHERE EXISTS (
SELECT 1
FROM taggings t
WHERE t.taggable_id = ta .id
AND t.taggable_type = 'Tattoo'
AND t.tag_id IN (3, 4, 5, 24, 205)
)
ORDER BY RANDOM()
LIMIT 6;
You'll have to implement it in Ruby yourself.
not sure about the random, as it should work.
But take a note of http://railsforum.com/viewtopic.php?id=36581
which has code that might suit you
/lib/agnostic_random.rb
module AgnosticRandom
def random
case DB_ADAPTER
when "mysql" then "RAND()"
when "postgresql" then "RANDOM()"
end
end
end
/initializers/extend_ar.rb (name doesn't matter)
ActiveRecord::Base.extend AgnosticRandom

Rails 3 LIKE query raises exception when using a double colon and a dot

In rails 3.0.0, the following query works fine:
Author.where("name LIKE :input",{:input => "#{params[:q]}%"}).includes(:books).order('created_at')
However, when I input as search string (so containing a double colon followed by a dot):
aa:.bb
I get the following exception:
ActiveRecord::StatementInvalid: SQLite3::SQLException: ambiguous column name: created_at
In the logs the these are the sql queries:
with aa as input:
Author Load (0.4ms) SELECT "authors".* FROM "authors" WHERE (name LIKE 'aa%') ORDER BY created_at
Book Load (2.5ms) SELECT "books".* FROM "books" WHERE ("books".author_id IN (1,2,3)) ORDER BY id
with aa:.bb as input:
SELECT DISTINCT "authors".id FROM "authors" LEFT OUTER JOIN "books" ON "books"."author_id" = "authors"."id" WHERE (name LIKE 'aa:.bb%') ORDER BY created_at DESC LIMIT 12 OFFSET 0
SQLite3::SQLException: ambiguous column name: created_at
It seems that with the aa:.bb input, an extra query is made to fetch the distinct author id_s.
I thought Rails would escape all the characters. Is this expected behaviour or a bug?
Best Regards,
Pieter
The "ambiguous column" error usually happens when you use includes or joins and don't specify which table you're referring to:
"name LIKE :input"
Should be:
"authors.name LIKE :input"
Just "name" is ambiguous if your books table has a name column too.
Also: have a look at your development.log to see what the generated query looks like. This will show you if it's being escaped properly.
Replace
.includes(:books)
with
.preload(:books)
This should force activerecord to use 2 queries instead of the join.
Rails has 2 versions of includes: One which constructs a big query with joins (the 2nd of your 2 queries and thus more likely to result in ambiguous column references and one that avoids the joins in favour of a separate query per association.
Rails decides which strategy to used based on whether it thinks that your conditions, order etc refer to the included tables (since in that case the joins version is required). Where a condition is a string fragment that heuristic isn't very sophisticated - i seem to recall that it just scans the conditions for anything that might look like a column from another table (ie foo.bar) so having a literal of that form could fool it.
You can either qualify your column names so that it doesn't matter which includes strategy is used or you can use preload/eager_load instead of includes. These behave similarly to includes but force a specific include strategy rather than trying to guess which is most appropriate.

Resources