Rails includes causing incorrect result - ruby-on-rails

I am using includes to do eager loading on my one query. I am trying to query invoices to get the total amount invoiced. I get 2 different results when I use includes vs. when I don't. Can anyone explain why this is happening/how to best fix this?
all_invoices = Invoice.includes(:contractor, :invoice_items, :refunds).with_user(1).date_between(date_range).search_contractor("tester").displayed_invoices.order(created_at: 'DESC')
tester = Invoice.with_user(1).date_between(date_range).search_contractor("tester").displayed_invoices.order(created_at: 'DESC')
all_invoices.pluck(:total_in_cents).sum #this will return 80000
tester.pluck(:total_in_cents).sum #this will return 40000
The 2nd one is the correct result of what I"m looking for, but obviously having includes in there is helpful for speed so I'm not trying to remove it, but I need to get the correct result from it.
Anyone have any idea why this is happening?

You are plucking :total_in_cents from two different arrays.
Tester is plucking from SELECTED from "invoices"
all_invoices is plucking from SELECTED from "invoices", but also :contractor, :invoice_items and :refunds that meet the same criteria as:
.with_user(1).date_between(date_range).search_contractor("tester")
I assume some of these are fillers, but that .with_user queries a user, and that user probably has some other records related in :contractor, :invoice_items or refunds
You could test it by adjusting the column record names, and re-seeding the database or better yet filtering out those records not associated with the 'Invoices'and running the same query

Related

Optimizing has many record association query

I have this query that I've built using Enumerable#select. The purpose is to find records thave have no has many associated records or if it does have those records select only those with it's preview attribute set to true. The code below works perfectly for that use case. However, this query does not scale well. When I test against thousands of records it takes several hundred seconds to complete. How can this query be improved upon?
# User has many enrollments
# Enrollment belongs to user.
users_with_no_courses = User.includes(:enrollments).select {|user| user.enrollments.empty? || user.enrollments.where(preview: false).empty?}
So first, make sure enrollments.user_id has an index.
Second, you can speed this up by not loading all the enrollments, and doing your filtering in SQL:
User.where(<<-EOQ)
NOT EXISTS (SELECT 1
FROM enrollments e
WHERE e.user_id = users.id
AND NOT e.preview)
EOQ
By the way here I'm simplifying your two conditions into one: "no enrollments or no real enrollments" is the same as "no real enrollments".
If you want you can put this condition into a scope so it is more reusable.
Third, this is still going to be slow if you're instantiating thousands of User objects. So I would look into paginating if that makes sense, or find_each if this is an offline script. Or use raw SQL to avoid all the object instances.
Oh by the way: even though you are saying includes(:enrollments), this will still go back to the database, giving you an n+1 problem:
user.enrollments.where(preview: false)
That is because the where means ActiveRecord can't use the already-loaded association. You can avoid that by using select instead of where. But not loading the enrollments in the first place is even better.

ruby on rails manys' many

I am wondering how to do this without double each loop.
Assume that I have a user model and order model, and user has_many orders.
Now I have a users array which class is User::ActiveRecord_Relation
How can I get the orders of these users in one line?
Actually, the best way to do that is :
users.includes(:orders).map(&:orders)
Cause you eager load the orders of the users (only 2 sql queries)
Or
Order.where(user_id: users.pluck(:id))
is quite good too in term of performance
If you've got a many-to-many association and you need to quickly load in all the associated orders you will need to be careful to avoid the so-called "N plus 1" load that can result from the most obvious approach:
orders = users.collect(&:orders).flatten
This will iterate over each user and run a query like SELECT * FROM orders WHERE user_id=? without any grouping.
What you really want is this:
orders = Order.where(user_id: users.collect(&:id))
That should find all orders from all users in a single query.
An answer just come up my mind after I asked....
users.map {|user| user.orders}

Preload joined associations matching a condition with Rails

I want to display only the users who have a given skill and the following query works properly:
#users.joins(:personal_skills).where(personal_skills: search_conditions).distinct
Now in the search results, near a user, I want to display his personal_skill, that matching the wherecondition.
I can simply use user.personal_skills.where(search_conditions) for each user but that would cause the N+1 query problem.
How can I avoid that?
I mean, the Rails-way, otherwise just iterating over the returned rows would accomplish the task. Indeed each row contains both user data and the joined skill data: the problem is related to the object relational mapping.
Simply substituting joins with includes is not a solution because that would preload user.personal_skills and not the filtered set user.personal_skills.where(search_conditions) which is what I want to achieve.
You can get the users from the personal_skills
PersonalSkill.joins(:user).where(search_conditions, where(:user_id => #users.map(&:id))
with the result you can group the skills by the user

How to remove some items from a relation?

I am loading data from two models, and once the data are loaded in the variables, then I need to remove those items from the first relation, that are not in the second one.
A sample:
users = User.all
articles = Articles.order('created_at DESC').limit(100)
I have these two variables filled with relational data. Now I would need to remove from articles all items, where user_id value is not included in the users object. So in the articles would stay only items with user_id, that is in the variable users.
I tried it with a loop, but it was very slow. How do I do it effectively?
EDIT:
I know there's a way to avoid doing this by building a better query, but in my case, I cannot do that (although I agree that in the example above it's possible to do that). That thing is that I have in 2 variables loaded data from database and I would need to process them with Ruby. Is there a command for doing that?
Thank you
Assuming you have a belongs_to relation on the Article model:
articles.where.not(users: users)
This would give you at most 100, but probably less. If you want to return 100 with the condition (I haven't tested, but the idea is the same, put the conditions for users in the where statement):
Articles.includes(:users).where.not(users: true).order('created_at DESC').limit(100)
The best way to do this would probably be with a SQL join. Would this work?
Articles.joins(:user).order('created_at DESC').limit(100)

ActiveRecord return the newest record per user (unique)

I've got a User model and a Card model. User has many Cards, so card has a attribute user_id.
I want to fetch the newest single Card for each user. I've been able to do this:
Card.all.order(:user_id, :created_at)
# => gives me all the Cards, sorted by user_id then by created_at
This gets me half way there, and I could certainly iterate through these rows and grab the first one per user. But this smells really bad to me as I'd be doing a lot of this using Arrays in Ruby.
I can also do this:
Card.select('user_id, max(created_at)').group('user_id')
# => gives me user_id and created_at
...but I only get back user_ids and created_at timestamps. I can't select any other columns (including id) so what I'm getting back is worthless. I also don't understand why PG won't let me select more columns than above without putting them in the group_by or an aggregate function.
I'd prefer to find a way to get what I want using only ActiveRecord. I'm also willing to write this query in raw SQL but that's if I can't get it done with AR. BTW, I'm using a Postgres DB, which limits some of my options.
Thanks guys.
We join the cards table on itself, ON
a) first.id != second.id
b) first.user_id = second.user_id
c) first.created_at < second.created_at
Card.joins("LEFT JOIN cards AS c ON cards.id != c.id AND c.user_id = cards.user_id AND cards.created_at < c.created_at").where('c.id IS NULL')
This is a bit late, but I am working on the same matter, and i found this one works for me :
Card.all.group_by(&:user_id).map{|s| s.last.last}
What do you think ?
I've found one solution that is suboptimal performance-wise but will work for very small datasets, when time is short or it's a hobby project:
Card.all.order(:user_id, :created_at).to_a.uniq(&:user_id)
This takes the AR:Relation results, casts them into a Ruby Array, then performs a Array#uniq on the results with a Proc. After some brief testing it appears #uniq will preserve order, so as long as everything is in order before using uniq you should be good.
The feature is time sensitive so I'm going to use this for now, but I will be looking at something in raw SQL following #Gene's response and link.

Resources