Differences between `any?` and `exists?` in Ruby on Rails? - ruby-on-rails

In Ruby on Rails, there appear to be two methods to check whether a collection has any elements in it.
Namely, they are ActiveRecord::FinderMethods’ exists? and ActiveRecord::Relation’s any?. Running these in a generic query (Foo.first.bars.exists? and Foo.first.bars.any?) generated equivalent SQL. Is there any reason to use one over the other?

#any and #exists? are very different beasts but query similarly.
Mainly, #any? accepts a block — and with this block it retrieves the records in the relation, calls #to_a, calls the block, and then hits it with Enumerable#any?. Without a block, it's the equivalent to !empty? and counts the records of the relation.
#exists? always queries the database and never relies on preloaded records, and sets a LIMIT of 1. It's much more performant vs #any?. #exists? also accepts an options param as conditions to apply as you can see in the docs.

The use of ActiveRecord#any? is reduced over ActiveRecord#exists?. With any? you can check, in the case of passing a block, if certain elements in that array matches the criteria. Similar to the Enumerable#any? but don't confuse them.
The ActiveRecord#any? implements the Enumerable#any? inside the logic of its definition, by converting the Relation accessed to an array in case a block has been passed to it and yields and access the block parameters to implement in a "hand-made" way a "Ruby" any? method.
The handy else added is intended to return the negation of empty? applied to the Relation. That's why you can check in both ways if a model has or no records in it, like:
User.count # 0
User.any? # false
# SELECT 1 AS one FROM "users" LIMIT ? [["LIMIT", 1]]
User.exists? # false
# SELECT 1 AS one FROM "users" LIMIT ? [["LIMIT", 1]]
You could also check in the "any?" way, if some record attribute has a specific value:
Foo.any? { |foo| foo.title == 'foo' } # SELECT "posts".* FROM "posts"
Or to save "efficiency" by using exists? and improve your query and lines of code:
Foo.exists?(title: 'foo') # SELECT 1 AS one FROM "posts" WHERE "posts"."title" = ? LIMIT ? [["title", "foo"], ["LIMIT", 1]]
ActiveRecord#exists? offers many implementations and is intended to work in a SQL level, rather than any?, that anyways will convert the Relation what you're working with in an array if you don't pass a block.

The answers here are all based on very outdated versions. This commit from 2016 / ActiveRecord 5.1 changes empty?, which is called by any? when no block is passed, to call exists? when not preloaded. So in vaguely-modern Rails, the only difference when no block is passed is a few extra method calls and negations, and ignoring preloaded results.

I ran into a practical issue: exists? forces a DB query while any? doesn't.
user = User.new
user.skills = [Skill.new]
user.skills.any?
# => true
user.skills.exists?
# => false
Consider having factories and a before_create hook:
class User < ActiveRecord::Base
has_many :skills
before_create :ensure_skills
def ensure_skills
# Don't want users without skills
errors.add(:skills, :invalid) if !skills.exists?
end
end
FactoryBot.define do
factory :user do
skills { [association(:skill)] }
end
end
create(:user) will fail, because at the time of before_create skills are not yet persisted. Using .any? will solve this.

Related

ActiveRecord select with OR and exclusive limit

I have the need to query the database and retrieve the last 10 objects that are either active or declined. We use the following:
User.where(status: [:active, :declined]).limit(10)
Now we need to get the last 10 of each status (total of 20 users)
I've tried the following:
User.where(status: :active).limit(10).or(User.where(status: : declined).limit(10))
# SELECT "users".* FROM "users" WHERE ("users"."status" = $1 OR "users"."status" = $2) LIMIT $3
This does the same as the previous query and returns only 10 users, of mixed statuses.
How can I get the last 10 active users and the last 10 declined users with a single query?
I'm not sure that SQL allows doing what you want. First thing I would try would be to use a subquery, something like this:
class User < ApplicationRecord
scope :active, -> { where status: :active }
scope :declined, -> { where status: :declined }
scope :last_active_or_declined, -> {
where(id: active.limit(10).pluck(:id))
.or(where(id: declined.limit(10).pluck(:id))
}
end
Then somewhere else you could just do
User.last_active_or_declined()
What this does is to perform 2 different subqueries asking separately for each of the group of users and then getting the ones in the propper group ids. I would say you could even forget about the pluck(:id) parts since ActiveRecord is smart enough to add the proper select clause to your SQL, but I'm not 100% sure and I don't have any Rails project at hand where I can try this.
limit is not a permitted value for #or relationship. If you check the Rails code, the Error raised come from here:
def or!(other) # :nodoc:
incompatible_values = structurally_incompatible_values_for_or(other)
unless incompatible_values.empty?
raise ArgumentError, "Relation passed to #or must be structurally compatible. Incompatible values: #{incompatible_values}"
end
# more code
end
You can check which methods are restricted further down in the code here:
STRUCTURAL_OR_METHODS = Relation::VALUE_METHODS - [:extending, :where, :having, :unscope, :references]
def structurally_incompatible_values_for_or(other)
STRUCTURAL_OR_METHODS.reject do |method|
get_value(method) == other.get_value(method)
end
end
You can see in the Relation class here that limit is restricted:
SINGLE_VALUE_METHODS = [:limit, :offset, :lock, :readonly, :reordering,
:reverse_order, :distinct, :create_with, :skip_query_cache,
:skip_preloading]
So you will have to resort to raw SQL I'm afraid
I don't think you can do it with a single query, but you can do it with two queries, get the record ids, and then build a query using those record ids.
It's not ideal but as you're just plucking ids the impact isn't too bad.
user_ids = User.where(status: :active).limit(10).pluck(:id) + User.where(status: :declined).limit(10).pluck(id)
users = User.where(id: user_ids)
I think you can use UNION. Install active_record_union and replace or with union:
User.where(status: :active).limit(10).union(User.where(status: :declined).limit(10))

ActiveRecord pluck to SQL

I know these two statements perform the same SQL:
Using select
User.select(:email)
# SELECT `users`.`email` FROM `users`
And using pluck
User.all.pluck(:email)
# SELECT `users`.`email` FROM `users`
Now I need to get the SQL statement derived from each method. Given that the select method returns an ActiveRecord::Relation, I can call the to_sql method. However, I cannot figure out how to get the SQL statement derived from a pluck operation on an ActiveRecord::Relation object, given that the result is an array.
Please, take into account that this is a simplification of the problem. The number of attributes plucked can be arbitrarily high.
Any help would be appreciated.
You cannot chain to_sql with pluck as it doesn't return ActiveRecord::relation. If you try to do, it throws an exception like so
NoMethodError: undefined method `to_sql' for [[""]]:Array
I cannot figure out how to get the SQL statement derived from a pluck
operation on an ActiveRecord::Relation object, given that the result
is an array.
Well, as #cschroed pointed out in the comments, they both(select and pluck) perform same SQL queries. The only difference is that pluck return an array instead of ActiveRecord::Relation. It doesn't matter how many attributes you are trying to pluck, the SQL statement will be same as select
Example:
User.select(:first_name,:email)
#=> SELECT "users"."first_name", "users"."email" FROM "users"
Same for pluck too
User.all.pluck(:first_name,:email)
#=> SELECT "users"."first_name", "users"."email" FROM "users"
So, you just need to take the SQL statement returned by the select and believe that it is the same for the pluck. That's it!
You could monkey-patch the ActiveRecord::LogSubscriber class and provide a singleton that would register any active record queries, even the ones that doesn't return ActiveRecord::Relation objects:
class QueriesRegister
include Singleton
def queries
#queries ||= []
end
def flush
#queries = []
end
end
module ActiveRecord
class LogSubscriber < ActiveSupport::LogSubscriber
def sql(event)
QueriesRegister.instance.queries << event.payload[:sql]
"#{event.payload[:name]} (#{event.duration}) #{event.payload[:sql]}"
end
end
end
Run you query:
User.all.pluck(:email)
Then, to retrieve the queries:
QueriesRegister.instance.queries

Order and limit clauses unexpectedly passed down to scope

(the queries here have no sensible semantic but I chose them for the sake of simplicity)
Project.limit(10).where(id: Project.select(:id))
generates as expected the following SQL query:
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"projects"."id"
FROM
"projects"
) LIMIT 10
But if I defined in my Project class the method
def self.my_filter
where(id: Project.select(:id))
end
Then
Project.limit(10).my_filter
generates the following query
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."id" IN (
SELECT
"projects"."id"
FROM
"projects" LIMIT 10
) LIMIT 10
See how the LIMIT 10 has now been also applied to the subquery.
Same issue when using a .order clause.
It happens with Rails 4.2.2 and Rails 3.2.20. It happens when the subquery is done on the Project table, it does happens if the subquery is done on another table.
Is there something I'm doing wrong here or do you think it is a Rails bug?
A workaround is to build my_filter by explicitly adding limit(nil).reorder(nil) to it but it is hackish.
EDIT: another workaround is to append the limit clause after the my_filter scope: Project.my_filter.limit(10).
This is actually a feature. Class methods work similar to scopes in ActiveRecord models.
And if you want to remove the already added scopes, you can either use unscoped, either call the method on a class directly, not on a scope:
def self.my_filter
unscoped.where(id: Project.select(:id))
end
# or
Project.my_filter
Your class method is applied in a way you may not be expecting:
Project.limit(10) # => a relation, not the Project class
.my_filter # => calling a class method on a relation
# Does, the following, suddenly:
# scoping { Project.my_filter }
# It's a relation's wrapper
From: .../ruby-2.0.0-p598/gems/activerecord-4.1.6/lib/active_record/relation.rb # line 281:
Owner: ActiveRecord::Relation
Visibility: public
Signature: scoping()
Scope all queries to the current scope.
Comment.where(post_id: 1).scoping do
Comment.first
end
# => SELECT "comments".* FROM "comments"
# WHERE "comments"."post_id" = 1 ORDER BY "comments"."id" ASC LIMIT 1
Please check unscoped if you want to remove all previous scopes (including
the default_scope) during the execution of a block.
Inside that scoping block, your class will include all the scoping rules of a relation it was built from into all queries, as scoping will enforce context. This is done so class methods can be properly chained, while still retaining the correct self. Of course, when you try using a class method inside the class method, stuff blows up.
In your first, "expected outcome" example, where is "natively" defined on relations, so no scope enforcement takes place: it's just not necessary.
Yeah, documentation hints that you can use unscoped in your nested query, like so:
def my_filter
where(id: Project.unscoped.select(:id))
end
...since that's where you need the "bare basis". Or, as you've already found out, you can just place limit at the end:
Project.my_filter.limit(10)
...here, at the time my_filter gets to execute, scoping will do effectively nothing: there will be no context built up to this point.

ActiveRecord none? method fires query

I have this code
def evaluate(collection)
if collection.none?
[]
else
collection.group(#group).pluck(*#columns)
end
end
The collection is an ActiveRecord::Relation object - for e.g. User.where(:name => 'Killer')
Now sometimes I also pass the Rails 4 none relation Users.none, that's why the check for none. If I do not check for none?, the call to pluck throws an arguments exception.
The problem is whenever I query any relation for none? it executes the query. See here:
> User.where(id: 1).none?
User Load (0.2ms) SELECT "users".* FROM "v1_passengers" WHERE "users"."id" = 1
=> false
> User.where(id: 1).none.none?
=> true
I do not want to execute to query just to check for none. Any workarounds?
Update: The none? method is actually array method thats why the query is executed. It's like calling to_a on the relation. What I want to know is how to figure out if the relation is a none
Found one method to do this without firing query. When you call none on a relation it appends the ActiveRecord::NullRelation to the extending_values array of the relation:
> User.where(id: 1).extending_values.include?(ActiveRecord::NullRelation)
=> false
> User.where(id: 1).none.extending_values.include?(ActiveRecord::NullRelation)
=> true
Not sure you can, there's no distinguishing feature between a Null Relation and an Actual Relation. Perhaps go down the rescue route:
begin
collection.group(#group).pluck(*#columns)
rescue #add exact Exception to catch
[]
end
Not exactly clean but gets round the problem

find vs find_by vs where

I am new to rails. What I see that there are a lot of ways to find a record:
find_by_<columnname>(<columnvalue>)
find(:first, :conditions => { <columnname> => <columnvalue> }
where(<columnname> => <columnvalue>).first
And it looks like all of them end up generating exactly the same SQL. Also, I believe the same is true for finding multiple records:
find_all_by_<columnname>(<columnvalue>)
find(:all, :conditions => { <columnname> => <columnvalue> }
where(<columnname> => <columnvalue>)
Is there a rule of thumb or recommendation on which one to use?
where returns ActiveRecord::Relation
Now take a look at find_by implementation:
def find_by
where(*args).take
end
As you can see find_by is the same as where but it returns only one record. This method should be used for getting 1 record and where should be used for getting all records with some conditions.
Edit:
This answer is very old and other, better answers have come up since this post was made. I'd advise looking at the one posted below by #Hossam Khamis for more details.
Use whichever one you feel suits your needs best.
The find method is usually used to retrieve a row by ID:
Model.find(1)
It's worth noting that find will throw an exception if the item is not found by the attribute that you supply. Use where (as described below, which will return an empty array if the attribute is not found) to avoid an exception being thrown.
Other uses of find are usually replaced with things like this:
Model.all
Model.first
find_by is used as a helper when you're searching for information within a column, and it maps to such with naming conventions. For instance, if you have a column named name in your database, you'd use the following syntax:
Model.find_by(name: "Bob")
.where is more of a catch all that lets you use a bit more complex logic for when the conventional helpers won't do, and it returns an array of items that match your conditions (or an empty array otherwise).
Model.find
1- Parameter: ID of the object to find.
2- If found: It returns the object (One object only).
3- If not found: raises an ActiveRecord::RecordNotFound exception.
Model.find_by
1- Parameter: key/value
Example:
User.find_by name: 'John', email: 'john#doe.com'
2- If found: It returns the object.
3- If not found: returns nil.
Note: If you want it to raise ActiveRecord::RecordNotFound use find_by!
Model.where
1- Parameter: same as find_by
2- If found: It returns ActiveRecord::Relation containing one or more records matching the parameters.
3- If not found: It return an Empty ActiveRecord::Relation.
There is a difference between find and find_by in that find will return an error if not found, whereas find_by will return null.
Sometimes it is easier to read if you have a method like find_by email: "haha", as opposed to .where(email: some_params).first.
Since Rails 4 you can do:
User.find_by(name: 'Bob')
which is the equivalent find_by_name in Rails 3.
Use #where when #find and #find_by are not enough.
The accepted answer generally covers it all, but I'd like to add something,
just incase you are planning to work with the model in a way like updating, and you are retrieving a single record(whose id you do not know), Then find_by is the way to go, because it retrieves the record and does not put it in an array
irb(main):037:0> #kit = Kit.find_by(number: "3456")
Kit Load (0.9ms) SELECT "kits".* FROM "kits" WHERE "kits"."number" =
'3456' LIMIT 1
=> #<Kit id: 1, number: "3456", created_at: "2015-05-12 06:10:56",
updated_at: "2015-05-12 06:10:56", job_id: nil>
irb(main):038:0> #kit.update(job_id: 2)
(0.2ms) BEGIN Kit Exists (0.4ms) SELECT 1 AS one FROM "kits" WHERE
("kits"."number" = '3456' AND "kits"."id" != 1) LIMIT 1 SQL (0.5ms)
UPDATE "kits" SET "job_id" = $1, "updated_at" = $2 WHERE "kits"."id" =
1 [["job_id", 2], ["updated_at", Tue, 12 May 2015 07:16:58 UTC +00:00]]
(0.6ms) COMMIT => true
but if you use where then you can not update it directly
irb(main):039:0> #kit = Kit.where(number: "3456")
Kit Load (1.2ms) SELECT "kits".* FROM "kits" WHERE "kits"."number" =
'3456' => #<ActiveRecord::Relation [#<Kit id: 1, number: "3456",
created_at: "2015-05-12 06:10:56", updated_at: "2015-05-12 07:16:58",
job_id: 2>]>
irb(main):040:0> #kit.update(job_id: 3)
ArgumentError: wrong number of arguments (1 for 2)
in such a case you would have to specify it like this
irb(main):043:0> #kit[0].update(job_id: 3)
(0.2ms) BEGIN Kit Exists (0.6ms) SELECT 1 AS one FROM "kits" WHERE
("kits"."number" = '3456' AND "kits"."id" != 1) LIMIT 1 SQL (0.6ms)
UPDATE "kits" SET "job_id" = $1, "updated_at" = $2 WHERE "kits"."id" = 1
[["job_id", 3], ["updated_at", Tue, 12 May 2015 07:28:04 UTC +00:00]]
(0.5ms) COMMIT => true
Apart from accepted answer, following is also valid
Model.find() can accept array of ids, and will return all records which matches.
Model.find_by_id(123) also accept array but will only process first id value present in array
Model.find([1,2,3])
Model.find_by_id([1,2,3])
The answers given so far are all OK.
However, one interesting difference is that Model.find searches by id; if found, it returns a Model object (just a single record) but throws an ActiveRecord::RecordNotFound otherwise.
Model.find_by is very similar to Model.find and lets you search any column or group of columns in your database but it returns nil if no record matches the search.
Model.where on the other hand returns a Model::ActiveRecord_Relation object which is just like an array containing all the records that match the search. If no record was found, it returns an empty Model::ActiveRecord_Relation object.
I hope these would help you in deciding which to use at any point in time.
Suppose I have a model User
User.find(id)
Returns a row where primary key = id. The return type will be User object.
User.find_by(email:"abc#xyz.com")
Returns first row with matching attribute or email in this case. Return type will be User object again.
Note :- User.find_by(email: "abc#xyz.com") is similar to User.find_by_email("abc#xyz.com")
User.where(project_id:1)
Returns all users in users table where attribute matches.
Here return type will be ActiveRecord::Relation object. ActiveRecord::Relation class includes Ruby's Enumerable module so you can use it's object like an array and traverse on it.
Both #2s in your lists are being deprecated. You can still use find(params[:id]) though.
Generally, where() works in most situations.
Here's a great post: https://web.archive.org/web/20150206131559/http://m.onkey.org/active-record-query-interface
The best part of working with any open source technology is that you can inspect length and breadth of it.
Checkout this link
find_by ~> Finds the first record matching the specified conditions. There is no implied ordering so if order matters, you should specify it yourself. If no record is found, returns nil.
find ~> Finds the first record matching the specified conditions , but if no record is found, it raises an exception but that is done deliberately.
Do checkout the above link, it has all the explanation and use cases for the following two functions.
I will personally recommend using
where(< columnname> => < columnvalue>)

Resources