Rails active record querying association with 'exists' - ruby-on-rails

I am working on an app that allows Members to take a survey (Member has a one to many relationship with Response). Response holds the member_id, question_id, and their answer.
The survey is submitted all or nothing, so if there are any records in the Response table for that Member they have completed the survey.
My question is, how do I re-write the query below so that it actually works? In SQL this would be a prime candidate for the EXISTS keyword.
def surveys_completed
members.where(responses: !nil ).count
end

You can use includes and then test if the related response(s) exists like this:
def surveys_completed
members.includes(:responses).where('responses.id IS NOT NULL')
end
Here is an alternative, with joins:
def surveys_completed
members.joins(:responses)
end
The solution using Rails 4:
def surveys_completed
members.includes(:responses).where.not(responses: { id: nil })
end
Alternative solution using activerecord_where_assoc:
This gem does exactly what is asked here: use EXISTS to to do a condition.
It works with Rails 4.1 to the most recent.
members.where_assoc_exists(:responses)
It can also do much more!
Similar questions:
How to query a model based on attribute of another model which belongs to the first model?
association named not found perhaps misspelled issue in rails association
Rails 3, has_one / has_many with lambda condition
Rails 4 scope to find parents with no children
Join multiple tables with active records

You can use SQL EXISTS keyword in elegant Rails-ish manner using Where Exists gem:
members.where_exists(:responses).count
Of course you can use raw SQL as well:
members.where("EXISTS" \
"(SELECT 1 FROM responses WHERE responses.member_id = members.id)").
count

You can also use a subquery:
members.where(id: Response.select(:member_id))
In comparison to something with includes it will not load the associated models (which is a performance benefit if you do not need them).

If you are on Rails 5 and above you should use left_joins. Otherwise a manual "LEFT OUTER JOINS" will also work. This is more performant than using includes mentioned in https://stackoverflow.com/a/18234998/3788753. includes will attempt to load the related objects into memory, whereas left_joins will build a "LEFT OUTER JOINS" query.
def surveys_completed
members.left_joins(:responses).where.not(responses: { id: nil })
end
Even if there are no related records (like the query above where you are finding by nil) includes still uses more memory. In my testing I found includes uses ~33x more memory on Rails 5.2.1. On Rails 4.2.x it was ~44x more memory compared to doing the joins manually.
See this gist for the test:
https://gist.github.com/johnathanludwig/96fc33fc135ee558e0f09fb23a8cf3f1

where.missing (Rails 6.1+)
Rails 6.1 introduces a new way to check for the absence of an association - where.missing.
Please, have a look at the following code snippet:
# Before:
Post.left_joins(:author).where(authors: { id: nil })
# After:
Post.where.missing(:author)
And this is an example of SQL query that is used under the hood:
Post.where.missing(:author)
# SELECT "posts".* FROM "posts"
# LEFT OUTER JOIN "authors" ON "authors"."id" = "posts"."author_id"
# WHERE "authors"."id" IS NULL
As a result, your particular case can be rewritten as follows:
def surveys_completed
members.where.missing(:response).count
end
Thanks.
Sources:
where.missing official docs.
Pull request.
Article from the Saeloun blog.
Notes:
where.associated - a counterpart for checking for the presence of an association is also available starting from Rails 7.
See offical docs and this answer.

Related

ActiveRecord includes over join Table very slow in Rails 4.1.2

I just updated from Rails 4.0.2 to Rails 4.1.2 and realized ActiveRecord includes have become unusably slow. What used to take just a view milliseconds now take almost 5 minutes.
I join two tables Item and Keyword over a join table with has_and_belongs_to_many in the model. I have almost 3000 items, 3000 keywords and 8000 join table entries.
Getting all items and including all keywords used to be very fast but now takes forever:
Item.includes(:keywords)
I compared the SQL of both 4.0.2 and 4.1.2 and Rails seems to not use an inner join query in Rails 4.1.2 anymore. Database response time is very fast, so this is not the issue.
SQL for Rails 4.0.2
Item Load (5.8ms) SELECT items.* FROM items
SQL (4.6ms) SELECT keywords.*, t0.item_id AS
ar_association_key_name FROM keywords INNER JOIN items_keywords
t0 ON keywords.id = t0.keyword_id WHERE t0.item_id IN
(<id1>, ...)
SQL for Rails 4.1.2
Item Load (3.7ms) SELECT items.* FROM items
HABTM_Keywords Load (2.8ms) SELECT items_keywords.* FROM
items_keywords WHERE items_keywords.item_id IN (<id1>, ...)
Keyword Load (0.6ms) SELECT keywords.* FROM keywords WHERE
keywords.id IN (<id1>, ...)
Is this a known issue? I can not find anything on this so I thought it's probably best to ask the community first before reporting a bug report.
For now I changed my Rails version back to 4.0.2.
Thanks Björn
This has been a bug in 4.1.2 and is solved here:
https://github.com/rails/rails/pull/15675
You can avoid the performance regression here by explicitly referencing the association:
Item.includes(:keywords).references(:keywords)
But the problem is Rails-wide, and while there's a fix in, it's not in any release yet so I've put it in an initializer for now.
For me this is still quite slow, but only about half as slow as without the fix.
module ActiveRecord
# FIXME: this is a fix pulled from https://github.com/rails/rails/pull/15675
# for a serious performance issue, look to remove it on the next Rails upgrade
module Associations::Builder
class HasAndBelongsToMany
def hash
object_id.hash
end
def ==(other)
equal?(other)
end
alias :eql? :==
end
end
module AttributeMethods
module PrimaryKey
def id
return unless self.class.primary_key
sync_with_transaction_state
read_attribute(self.class.primary_key)
end
end
end
end

Add conditions do activerecord includes

First I have this:
has_one :guess
scope :with_guesses, ->{ includes(:guess) }
Which loads all guesses (if they exists) for a 'X' model (run two queries). That's ok. works perfectly.
But I need to add one more condition to It.
If I do (my first thought):
scope :with_guesses, ->(user) { includes(:guess).where("guesses.user_id = ?", user.id) }
It will also run ok, BUT in one query (join) which will exclude results that doesn't have a 'guess'.
Any tips on how to use include with conditions but KEEPING the results that don't have a 'guess' ?
UPDATE
I ended up solving this by using a decorator, which I can pass the user as a context in the controller call, keeping the views clean.
I've used the Draper gem (https://github.com/drapergem/draper) to do this. You don't really need a gem to work with decorators in rails, but it can be helpful.
I didn't test it but you can use something like
User.eager_load(:guesses).where("guesses.user_id = ?", user.id)
when you using includes and where, the includes left join will be inner join.
so if you want to using a left join with where, you have to use string sql fragment:
scope :with_guesses, ->(user) { joins('left outer join guesses on guesses.user_id = ?',
user.id)}
I didn't test this code above, you have to test it yourself, this is just a way to think about
this problem.
here is reference:
http://guides.rubyonrails.org/active_record_querying.html#specifying-conditions-on-eager-loaded-associations

How do I order database records in rails by most recent?

I want to order all the items in a model Item so it displays the most recent ones first. According to the Rails guide, the following code should work:
Item.order("created_at DESC")
However, when I type that (or varieties) in the terminal, the most recent item always shows up last, and that's how they show up on my page. How do I efficiently retrieve them with he most recent first? (I'm going to display only some of them at a time on each page.)
Note that my default scope for items is oldest first.
Update:
This is the SQL I get:
SELECT "comments".* FROM "comments" ORDER BY comments.created_at ASC, created_at DESC
So I guess I shouldn't use default scopes...
The query you posted is correct
Item.order("created_at DESC")
The only reason why it would not work is if there is anything else conflicting with it. In general, the conflict is represented by a default_scope.
If you have a default scope that overrides your order, you should first unscope the query
Item.unscoped { Item.order("created_at DESC") }
If you are using default scopes, I strongly encourage you to avoid them. They are very hard to debug and unscope.
There are very few cases where default scopes make sense. You can simply pass the (default) scope at select time in the controller or create a custom method for it.
I realise this is a really old question, but none of the answers contain the solution without writing raw SQL, which is available since Rails 3+:
Item.order(created_at: :desc)
or using the reverse_order method:
Item.order(:created_at).reverse_order
See more at http://guides.rubyonrails.org/active_record_querying.html#ordering
and
http://guides.rubyonrails.org/active_record_querying.html#reverse-order.
I modified CDub's answer with reverse so it now works:
Item.order(:created_at).reverse
I'm still not sure why the Rails guide's way doesn't work. Also the above method doesn't work well with pagination.
Item.unscoped.order('created_at DESC') should work.Using reverse might decrease the performance when the number of records increases
Correct one and tested
#purchase_orders = current_company.purchase_orders.order(:date)
#purchase_orders = #purchase_orders.reverse_order
you can add You can also define the default order in Item model
default_scope order('created_at DESC')

Rails order not working with only

I can't find this documented anywhere but here is my problem: when I query via active record and use "only" the "order" clause is ignored. For example I have a Blog model:
Blog.order('id desc')
returns all the blogs with the highest ID first - as expected but:
Blog.order('id desc').only(:id)
returns only the id's (as expected) but the order clause is completely ignored, the smallest id comes first.
I have tested this with Ruby 1.9.3p327 and both Rails 4.0.0.beta1 and Rails 3.2.13 and I get the same results.
Is this a feature or a bug? To me it's a bug because the Rails crew were trumpeting how find_by_sql is not really needed but in this case it is:
Blog.find_by_sql("select id from blogs order by id desc")
which gives the correct answer.
Try using pluck instead of only. only is used to override portions of the previously formed query chain. As the the docs demonstrate:
Post.where('id > 10').limit(20).order('id desc').only(:order, :where)
results in:
SELECT * FROM posts WHERE id > 10 ORDER BY id DESC
This is because the limit modification will be ignored, since the list passed to only doesn't include :limit.
UPDATE
If you need an actual model object returned instead of an array of id's, use select:
Blog.order('id desc').select('id')

set_table_name only works once?

I'm trying to use set_table_name to use one generic model on a couple different tables. However, it seems as though set_table name only works on the class once per application session. For instance in a rails 3 console (ruby 1.8.7) the following happens:
GenericModel.set_table_name "table_a"
puts GenericModel.table_name # prints table_a
pp GenericModel.column_names # prints the columns associated with table_a
GenericModel.set_table_name "table_b"
puts GenericModel.table_name # prints table_b
pp GenericModel.column_names # still prints the columns associated with table_a
Currently the workaround I've found is to also add .from(table_b) so that queries don't error out with 'table_b.id doesn't exist!' because the query still thinks it's FROM table_a.
Can others reproduce the issue? Is this the intended behaviour of set_table_name?
UPDATE
Adding
Model.reset_column_information
after set_table_name forces the model to work as I expect.
Reference found in http://ar.rubyonrails.org/classes/ActiveRecord/Base.html#M000368
This is probably an undocumented limitation. Once the SHOW FIELDS FROM has been executed, which is where the results from column_names comes from, it is usually cached, at least for the duration of the request. If you must, try using the console reload! method to reset things.
your choice
rename_table
more info at AR TableDefinition

Resources