I have a class called Membership. There could be multiple records with the same email. To retrieve all Membership records by email, I am creating an index on email and doing:
Membership.where(email: "example#example.com")
How expensive is the above operation? Is there a more efficient way of doing this?
Thanks!
I think the most efficient way is with group query like this:
Membership.select(:email).group(:email).having("count(*) > 1")
This will generate following query
SELECT "memberships"."email" FROM "memberships" GROUP BY "memberships"."email" HAVING (count(*) > 1)
That should work on PG and MySQL.
Hope it helps
EDIT
if you want to see duplicates, you can do this (put a count at the end)
Membership.select(:email).group(:email).having("count(*) > 1").count
This will give a collection with each email and number of duplicates listed like this:
{ 'some#duplicate.com' => 2, ...etc}
Related
There are 2 tables : User and Teacher. Teacher.user_id is from User. So, how do I find in a single query, all the users who are not in teachers.
I meant something along the lines :
User.not_in(Teacher.all)
You can use where.not query from ActiveRecord try something like below:
User.where.not(id: Teacher.pluck(:user_id).reject {|x| x.nil?})
Note: used reject method, in case you have nil values in some records.
The other users seem to have neglected the rails 3 tag (since removed based on the approved answer. My answer left for posterity) : Please try this
User.where("id NOT IN (?)",Teacher.pluck(:user_id).join(","))
This will become SELECT * FROM users WHERE id NOT IN (....) (two queries one to get the user_id from teachers and another to get the user(s) not in that list) and may fail based on the size of teacher table.
Other option is an arel table:
users = User.arel_table
User.where(users[:id].not_in(Teacher.select(:user_id).where("user_id IS NOT NULL")))
This should produce a single query similar to
SELECT * FROM users
WHERE id NOT IN ( SELECT user_id FROM teachers WHERE user_id IS NOT NULL)
(one query better performance) * syntax was not fully tested
Another single query option might be
User.joins("LEFT OUTER JOIN teachers ON teachers.user_id = users.id").
where("teachers.user_id IS NULL")
I think you should be able to do something like this
User.where.not(id: Teacher.ids)
I try to find over a 3M table, all the users who have the same username. I read something like this may do the trick.
User.find(:all, :group => [:username], :having => "count(*) > 1" )
However since I'm using Postgres this return me ActiveRecord::StatementInvalid: PG::Error: ERROR: column "users.id" must appear in the GROUP BY clause or be used in an aggregate function.
I'm trying something like this
User.select('users.id, users.username').having("count(*) > 1").group('users.username')
But still get the same error. Any idea what I'm doing wrong?
Update: I made it somehow work using User.select('users.*').group('users.id').having('count(users.username) > 1') but this query returns me this which looks like an empty array even if is founding 5 records.
GroupAggregate (cost=9781143.40..9843673.60 rows=3126510 width=1365)
Filter: (count(username) > 1)
-> Sort (cost=9781143.40..9788959.68 rows=3126510 width=1365)
Sort Key: id
-> Seq Scan on users (cost=0.00..146751.10 rows=3126510 width=1365)
(5 rows)
=> []
Any idea why this is happening and how to get those 5 rows?
I think the best you could get is to get usernames for duplicate records. That can be achieved with
User.select(:username).group(:username).having('COUNT(username) > 1')
"group by" in database collapses each group into one row in output. Most likely what you are intending will be produced by the following query:
User.where("name in (select name from users group by name having count(*)>1)").order(:name)
The inner query above finds all names that appear more than once. Then we find all rows with these names. Ordering by name will make your further processing easier. To speedup, add index to column name in users table.
There are alternate Postgres specific ways to solve this, however the above will work across all databases.
I have a relationship between two models, Registers and Competitions. I have a very complicated dynamic query that is being built and if the conditions are right I need to limit Registration records to only those where it's Competition parent meets a certain criteria. In order to do this without select from the Competition table I was thinking of something along the lines of...
Register.where("competition_id in ?", Competition.where("...").collect {|i| i.id})
Which produces this SQL:
SELECT "registers".* FROM "registers" WHERE (competition_id in 1,2,3,4...)
I don't think PostgreSQL liked the fact that the in parameters aren't surrounded by parenthesis. How can I compare the Register foreign key to a list of competition ids?
you can make it a bit shorter and skip the collect (this worked for me in 3.2.3).
Register.where(competition_id: Competition.where("..."))
this will result in the following sql:
SELECT "registers".* FROM "registers" WHERE "registers"."competition_id" IN (SELECT "competitions"."id" FROM "competitions" WHERE "...")
Try this instead:
competitions = Competition.where("...").collect {|i| i.id}
Register.where(:competition_id => competitions)
I've got a Rails ActiveRecord query that find all the records where the name is some token.
records = Market.where("lower(name) = ?", name.downcase );
rec = records.first;
count = records.count;
The server shows that the calls for .first and .count were BOTH hitting the database.
←[1m←[35mCACHE (0.0ms)←[0m SELECT "markets".* FROM "markets" WHERE (lower(nam
e) = 'my market') LIMIT 1
←[1m←[36mCACHE (0.0ms)←[0m ←[1mSELECT COUNT(*) FROM "markets" WHERE (lower(na
me) = 'my market')←[0m
Why is it going to the database to get the count when it can use the results already queried?
I'm concerned about future performance. Today there are 1000 records. When that table holds 8 million rows, doing two queries one for data, and one for count, it will be expensive.
How do I get the count from the collection, not the database?
RactiveRecord use lazy query to fetch data from database. If you want to simple count the records, you can only call size of the retrun array.
records = Market.where("lower(name) = ?", name.downcase ).all
records.size
So, records is an ActiveRelation. You would think it's an array of all your Market records that match your where criteria, but it's not. Each time you reference something like first or count on that relation, it performs the query retrieve what you're asking for.
To get the actual records into an array, just add .all to the relation to actually retrieve them. Like:
records = Market.where("lower(name) = ?", name.downcase).all
count = records.count
For Rails 6.0.1 and Ruby 2.6.5
You will need to store the results into an array by using the to_a.
records = Market.where("lower(name) = ?", name.downcase).to_a
This will create the SQL query and store the results in the array records.
Then, when you call either records.first or records.count it will only return the data or do the calculation, not rerun a query. This is the same for records.size and records.length.
Another Example
I was needing to do this for a blog I am developing. I was trying to run a query to find all of the tags associated with a post, and I wanted to count how many tags there were. This was causing multiple queries until I came across the to_a suffix.
So, my SQL query looks like this:
#tags = TagMap.where(post_id: #post).joins(:tag).select(:id, '"tags"."name"').to_a
This looks through my TagMap table for all records that have post_id equal to the id of the post that I am viewing. It then joins to the Tags table and pulls only the id of the TagMap record and the name of the tag from the Tags table. Then it puts them all into an array. I can then run #tags.count and it will return the number of TagMap records for that post without doing another query.
I hope that this helps anyone using Rails 6+
I have 2 models: Purchase and User. User has_many :purchases, and Purchase belongs_to :user. I want to select the following:
How many distinct users have made a purchase in the last 3 months, and also a way to print their email addresses ('Purchase' has a created_at field) (I want to just do this using the console). I'm a little confused as to how to go about it from a ruby perspective (I could do the straight SQL query, but I'd like to figure out how to do it in Ruby).
User.joins(:purchases).where("purchases.created_at > ?", 3.month.ago)
To get distinct user list
User.joins(:purchases).where("purchases.created_at > ?", 3.month.ago).uniq
To get distinct collection of user emails
User.joins(:purchases).where("purchases.created_at > ?", 3.month.ago).uniq.pluck(:email)
Rails 2: Purchase.all(:conditions => ["created_at > ?", 3.months.ago]).user.uniq
Rails 3: Purchase.where(:created_at > 3.months.ago).user.uniq
It always depends on the data structure and how many record u would have.
If that's in a console which is on ur dev environment do whatever u like, otherwise good sql query will perform better.
Select field which are needed, use pluck where possible to get ids.
Use those ids to do a straight query.
To do comparison <> u will have to use sql.
If u only need emails, add a select to the query so you would have 'select email from users;' instead of 'select * from users'.
so:
User.joins(:purchases).where('purchases.created_at > ?', .month.ago).pluck('distinct(email)')
will return you an array of your users' emails
apart from that have u bothered to read this? and tried that?
Purchase.where(:created_at > 3.months.ago).map(&:user).uniq