I have a model A, and a model B. A has_and_belongs_to_many Bs, and vice versa.
Now, I want to find an object/entity within A that has_and_belongs_to certain objects within B (say B1 and B2). How can I do that efficiently within Rails? My current solution is something like this:
A.all.select {|a| a.bs.sort == [B1, B2]}.first
It basically iterates through all objects within A and checks if it has_and_belongs_to the correct Bs. That is very inefficient. Is there a better way to do this?
You can do this with nested sub-queries, which is a working solution but not necessarily an efficient one, so you'll have to run some benchmarks.
The following involves three nested queries performed on the join_table between A and B.
You first determine the id's of all B's (call these excluded_bs) that are not either B1 or B2. Then, you determine which A's are in a relationship with these excluded_bs and call them excluded_as. All the A's that are not in excluded_as are exactly the ones we want (call them included_as). Once you have included_as just query the A table.
excluded_bs = %(SELECT B_id FROM join_table WHERE B_id NOT IN (:included_bs))
excluded_as = %(SELECT A_id FROM join_table WHERE B_id IN (#{excluded_bs}))
included_as = %(SELECT A_id FROM join_table WHERE A_id NOT IN (#{excluded_as}))
A.where("id IN (included_as)", :included_bs => [B1.id, B2.id])
This should give you all the A's that are in a relationship with exactly B1 and B2, but not with any others. You might be able to clean this up a bit and make it more efficient, but it should at least work.
EDIT:
Whoops! To trim off those that only have either B1 or B2, try a GROUP BY. Change the last sub-query to
included_as = %(SELECT A_id, COUNT(*) as Total FROM join_table WHERE A_id NOT IN (#{excluded_as}) GROUP BY A_id HAVING Total = :count)
and the main query to
Bs = [B1, B2]
A.where("id IN (SELECT A_id FROM (#{included_as}))", :included_bs => Bs.map(&:id), :count => Bs.count)
You can filter on habtm associations:
A.joins(:bs).where("bs.id" => [B1, B2]).first
A.joins(:bs).where("bs.id" => [B1, B2]).all
To ensure that only the items with exactly two associations are returned, use
A.joins(:bs).where("bs.id" => [B1, B2]).group("as.id HAVING COUNT(bs.id) = 2")
Related
In the following book club example with associations:
class User
has_and_belongs_to_many :clubs
has_and_belongs_to_many :books
end
class Club
has_and_belongs_to_many :users
has_and_belongs_to_many :books
end
class Book
has_and_belongs_to_many :users
has_and_belongs_to_many :clubs
end
given a specific club record:
club = Club.find(params[:id])
how can I find all the users in the club who have all books in array of books?
club.users.where_has_all_books(books)
In PostgreSQL it can be done with a single query. (Maybe in MySQL too, I'm just not sure.)
So, some basic assumptions first. 3 tables: clubs, users and books, every table has id as a primary key. 3 join tables, books_clubs, books_users, clubs_users, each table contains pairs of ids (for books_clubs it will be [book_id, club_id]), and those pairs are unique within that table. Quite reasonable conditions IMO.
Building a query:
First, let's get ids of books from given club:
SELECT book_id
FROM books_clubs
WHERE club_id = 1
ORDER BY book_id
Then get users from given club, and group them by user.id:
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = 1
GROUP BY CU.user_id
Join these two queries by adding having to 2nd query:
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(##1##)
where ##1## is the 1st query.
What's going on here: Function array_agg from the left part creates a sorted list (of array type) of book_ids. These are books of user. ARRAY(##1##) from the right part returns the sorted list of books of the club. And operator #> checks if 1st array contains all elements of the 2nd (ie if user has all books of the club).
Since 1st query needs to be performed only once, it can be moved to WITH clause.
Your complete query:
WITH club_book_ids AS (
SELECT book_id
FROM books_clubs
WHERE club_id = :club_id
ORDER BY book_id
)
SELECT CU.user_id
FROM clubs_users CU
JOIN users U ON U.id = CU.user_id
JOIN books_users BU ON BU.user_id = CU.user_id
WHERE CU.club_id = :club_id
GROUP BY CU.user_id
HAVING array_agg(BU.book_id ORDER BY BU.book_id) #> ARRAY(SELECT * FROM club_book_ids);
It can be verified in this sandbox: https://www.db-fiddle.com/f/cdPtRfT2uSGp4DSDywST92/5
Wrap it to find_by_sql and that's it.
Some notes:
ordering by book_id is not necessary; #> operator works with unordered arrays too. I just have a suspicion that comparison of ordered array is faster.
JOIN users U ON U.id = CU.user_id in 2nd query is only necessary for fetching user properties; in case of fetching user ids only it can be removed
It appears to work by grouping and counting.
club.users.joins(:books).where(books: { id: club.books.pluck(:id) }).group('users.id').having('count(*) = ?', club.books.count)
If anyone knows how to run the query without intermediate queries that would be great and I will accept the answer.
This looks like a situation where you'd make two queries, one to get all the ids you need, the other select perform a WHERE IN.
Imagine these associations:
class Bookshelf
has_many :book_associations, dependent: :destroy
has_many :books, through: :book_associations
end
class Book
has_many :book_associations, dependent: :destroy
has_many :bookshelves, through: :book_associations
end
class BookAssociation
belongs_to :book
belongs_to :bookshelf
end
I need to find all bookshelves that contain a book with ID A and a book with ID B, C, or D
I can do this in a multi-step process using ruby like:
bookshelf_ids1 = Book.find(A).bookshelves.pluck(:id)
bookshelf_ids2 = Book.where(id: [B, C, D]).map(&:bookshelves).flatten.uniq.pluck(:id)
Bookshelf.where(id: bookshelf_ids1 & bookshelf_ids2)
But there must be a way to do this in one line, either through ActiveRecord or a raw SQL query.
What this question boils down to is that you're looking for an intersection of Bookshelf objects in set A (contains book with ID a) and Bookshelf objects in set B (contains book with ID in array b).
I don't recall seeing any easy way to express this intersection using a single ActiveRecord query. And as you probably have suspected, a multi query approach wouldn't scale well. Why run three queries when you can run one?
So here's my solution:
Finding the Bookshelve IDs
SELECT DISTINCT b1.bookshelf_id
FROM book_associations b1
INNER JOIN book_associations b2 ON b1.bookshelf_id = b2.bookshelf_id
WHERE b1.book_id = 1 AND b2.book_id IN (2,3,4);
This is a bit complicated so let me break it down. We are joining the book_associations table on itself, on its own bookshelf_id. This has the effect of making two tables available for filtering. We then filter query table b1 for the first criteria (ID = 1), and filter query table b2 for the other criteria (ID in (2,3,4)). With the INNER JOIN we are then ensuring that we are only getting the intersection of tables b1 and b2. We retrieve only the bookshelf_id because we're looking only to retrieve the bookshelves. Finally, the DISTINCT query is the SQL equivalent of .uniq and ensures the returned values are unique.
Retrieving the Bookshelves
From here, we then need to instantiate the Bookshelf objects. While we could do this:
bookshelf_ids = ActiveRecord::Base.connection.query(["SELECT DISTINCT b1.bookshelf_id
FROM book_associations b1
INNER JOIN book_associations b2 ON b1.bookshelf_id = b2.bookshelf_id
WHERE b1.book_id = ? AND b2.book_id IN (?);", 1, [2,3,4]])
bookshelves = Bookshelf.find(bookshelf_ids)
It's still two steps. Here's a single-step solution:
Bookshelf.find_by_sql(["SELECT * FROM bookshelves bs
WHERE bs.id IN (SELECT DISTINCT b1.bookshelf_id
FROM book_associations b1
INNER JOIN book_associations b2 ON b1.bookshelf_id = b2.bookshelf_id
WHERE b1.book_id = ? AND b2.book_id IN (?)
)", first_id,second_ids])
The Bookshelf.select_by_sql command instantiates records from the query results. The bookshelf_ids are retrieved in the subquery and used as a condition for the query on the bookshelves table.
Caveats
I haven't tested this code and can't confirm it will work, but the broad strokes should be correct.
The SQL should be valid for PostgresQL, but may require some tweaks depending on your specific DB implementation.
Edit
I've corrected the code above, I had joined the book_associations table on the wrong column (id) instead of the correct column bookshelf_id, and the subquery was returning the wrong column (again id when it should've been bookshelf_id).
I've created a proof of concept with test included.
For example I have three table named A B and C. A can have many B and B can have many C. At only one condition A.example_condition = 1 then A will have only one B and B has only one C.
I want to get directly C from A from that case. So I try this:
class A < ApplicationRecord
has_one :special_c -> {joins("INNER JOIN B ON B.id = C.b_id INNER JOIN A ON A.id = B.a_id")
.where("A.example_condition = 1")}, class_name: "C"
end
But when I run this above query, ActiveRecord will automatically insert statement after where clause:
A.id = C.a_id
Obviously C table doesn't have column a_id (it is stored in B table). My question is: How can I improve above query for my purpose or some other better way for this problem.
thanks
Tables:
#leads
id | user_id |created_at | updated_at
#users
id | first_name
#todos
id | deadline_at | target_id
I want to get unique list of leads between two dates(deadline_at) with ordering by todos.deadline_at desc
I do:
SELECT distinct(leads.*), todos.deadline_at
FROM leads
INNER JOIN users ON users.id = leads.user_id
LEFT JOIN todos ON todos.target_id = leads.user_id
WHERE (todos.deadline_at between '2015-11-26T00:00:00+00:00' and '2015-11-26T23:59:59+00:00')
ORDER BY todos.deadline_at DESC;
This query returns right ordered list but with duplicates. If I use distinct or distinct on with leads.id, then postgresql requires me use it in order by - In that case I got wrong ordered list.
How do I can achive expected result?
Since you don't really need the users table.
Maybe try this?
Lead.joins("INNER JOIN todos ON leads.user_id = todos.target_id")
.where("leads.deadline_at" => (date_a..date_b))
.select("leads.*, todos.deadline_at")
.order("todos.deadline_at desc")
It seams that you're confusing the result of a sql table with joins and the same result after ActiveRecord treatment on an association.
I presume Lead has_many :todos, through: :user so you can do this :
Lead.eager_load(:todos).
where("leads.deadline_at" => (date_a..date_b)).
order("todos.deadline_at")
No need to apply distinct or whatever, ActiveRecord will sort out the leads from the todosand you'll have them in the right order with no duplicates. The raw sql result however will have plenty of duplicates.
If you want to achieve something similar in sql alone, you can use distinct or group by on leads.id, but then you'll lose all the todos it "contains". However you can use aggregate function to calculate/extract things on the "lost todo data".
For example :
Lead.joins(:todos).
group("leads.id").
select("leads.*, min(todos.deadline_at) as first_todo_deadline")
order("first_todo_deadline")
Notice that todos data is only available in the aggregate functions (min, count, avg, etc) since the todos are "compressed" if you wish in each lead!
Hope it makes sense.
This is a totally a beginner question. I'm embarrassed to be asking it, but here goes.
We have two models: Person and Order.
Person has attributes :first_name, :last_name, :age. Order has one attribute, :total.
Person has_many :orders, and Order belongs_to :person.
Let's assume that some data has been entered for both models.
Now, we play test this relationship in console:
p = Person.first
o = Order.new(total: 100)
o.person = p # this is equivalent to: o.person_id = p.id, yes?
o.save
p.orders
My questions stem from line 3 and line 5.
Question 1: Why do we have to say o.person instead of o in line 3?
Question 2: Why are we saying p.orders in line 5?
Question 3: What does this, o.person_id = p.id, mean exactly? I'm assuming it's associating the tables with each other?
Let me know if this question is unclear.
Thank you for your help!
Question 1: Why do we have to say "o.person instead of o" in line 3?
Order belongs to a Person, so on that line you specify the exact person who owns that order o by typing o.person = p. o = p doesn't make any sense.
Question 2: And why are we saying "p.orders" in line 5?
Because each Person has many orders, so you can get them by typing p.orders
Question 3: Also, what does this "o.person_id = p.id" mean exactly? I'm assuming it's associating the tables with each other?
Yes, this sets the owner of the order.
Ah, I see additional question:
o.person = p (this is equivalent to: o.person_id = p.id, yes?)
Not always, but in most cases. Say, for polymorphic associations it will not only set id, but also a type.
Question 1: Why do we have to say "o.person instead of o" in line 3?
When you declared that an Order object belongs_to :person, rails created a column in the Orders table called person_id. In the Orders table, the column names are the attributes of an Order object, and you refer to the attributes of an Order object using dot notation, e.g. o.total.
As a convenience, rails lets you assign the whole Person object to an Order attribute named person, then rails extracts the Person id, and rails inserts the id into the person_id column in the Orders table.
Your question is sort of like asking, why you have to write:
o.total = 10
instead of
o = 10
The last line does not tell rails what column in the Orders table that the value should go in.
A table is just a grid of column names and values:
Orders:
id total person_id timestamp1 timestamp2
1 10 1 1234567 4567890
2 30 3 12342134 1324123423
3 20 1 1341234324 12341342344
Then if you write:
o = Order.find(2)
Then o will be assigned an Order object whose values for total, person_id, timestamp1, and timestap2, will be the values in the row where the id is equal to 2.
Next, if you write:
o = 10
What does that mean? Does it mean that all columns for o's row should be set to the value 10? Set the first column to 10? Set the last column to 10? Isn't it much clearer to write o.person = 10?
Question 2: And why are we saying "p.orders" in line 5?
That retrieves all the orders associated with a Person object--remember you declared that a Person object has_many Orders. Once again, that is a convenience provided by Rails--not declaring the associations would force you to write:
target_person_id = 1
#orders = Order.where(person_id: target_person_id)
Question 3: Also, what does this "o.person_id = p.id" mean exactly?
I'm assuming it's associating the tables with each other?
p is a Person object, e.g. one of these rows:
People:
id first last middle order_id timestamp1 timestamp2
1 Tom Thumb T 1 4567890 1234456
2 Wizard Id of 3 1324123423 123434
3 Tom Thumb T 2 2134234 1234234
If p is a Person object created from the the last row of values, then p.id is equal to 3, which means that the line:
o.person_id = p.id
is equivalent to:
o.person_id = 3
Next, o is an Order object, and the Orders table has a column named person_id which was created when you declared: belongs_to: person, and the line:
o.person_id = 3
instructs rails to insert 3 for the value of o's person_id column. If o's id is 1, then you get this:
Orders:
id total person_id timestamp1 timestamp2
=> 1 10 3 <= 1234567 4567890
2 30 3 12342134 1324123423
3 20 1 1341234324 12341342344