Rails: uniq vs. distinct - ruby-on-rails

Can someone briefly explain to me the difference in use between the methods uniq and distinct?
I've seen both used in similar context, but the difference isnt quite clear to me.

Rails queries acts like arrays, thus .uniq produces the same result as .distinct, but
.distinct is sql query method
.uniq is array method
Note: In Rails 5+ Relation#uniq is deprecated and recommended to use Relation#distinct instead.
See http://edgeguides.rubyonrails.org/5_0_release_notes.html#active-record-deprecations
Hint:
Using .includes before calling .uniq/.distinct can slow or speed up your app, because
uniq won't spawn additional sql query
distinct will do
But both results will be the same
Example:
users = User.includes(:posts)
puts users
# First sql query for includes
users.uniq
# No sql query! (here you speed up you app)
users.distinct
# Second distinct sql query! (here you slow down your app)
This can be useful to make performant application
Hint:
Same works for
.size vs .count;
present? vs .exists?
map vs pluck

Rails 5.1 has removed the uniq method from Activerecord Relation and added distinct method...
If you use uniq with query it will just convert the Activerecord Relaction to Array class...
You can not have Query chain if you added uniq there....(i.e you can not do User.active.uniq.subscribed it will throw error undefined method subscribed for Array )
If your DB is large and you want to fetch only required distinct entries its good to use distinct method with Activerecord Relation query...

From the documentation:
uniq(value = true)
Alias for ActiveRecord::QueryMethods#distinct

Its not exactly answer your question, but what I know is:
If we consider ActiveRecord context then uniq is just an alias for distinct. And both work as removing duplicates on query result set(which you can say up to one level).
And at array context uniq is so powerful that it removes duplicates even if the elements are nested. for example
arr = [["first"], ["second"], ["first"]]
and if we do
arr.uniq
answer will be : [["first"], ["second"]]
So even if elements are blocks it will go in deep and removes duplicates.
Hope it helps you in some ways.

Related

tricky union query using ruby on rails/active record

I have
a = Profile.last
a.mailbox.inbox
a.mailbox.sentbox
active_conversations = [IDS OF ACTIVE CONVERSATIONS]
a.mailbox.inbox & active_conversations
returns part of what I need
I want
(a.mailbox.inbox & active_conversations) AND a.mailbox.sentbox
but I need it as SQL, so that I can order it efficiently. I want to order it by ('updated_at')
I have tried joins and other things but they don't work. The classes of (a.mailbox.inboxa and the sentbox are
ActiveRecord::Relation::ActiveRecord_Relation_Conversation
but
(a.mailbox.inbox & active_conversations)
is an array
edit
Something as simple as a.mailbox.inbox JOINS SOMEHOW a.mailbox.sentbox I should be able to work with, but I also can't seem to figure out.
Instead of doing
(a.mailbox.inbox & active_conversations)
you should be able to do
a.mailbox.inbux.where('conversations.id IN (?)', active_conversations)
I believe the Conversation class (and its underlying conversations table) should be right according to the mailboxer code.
However this gives you an ActiveRelation object instead of an array. You can transform this to pure SQL using to_sql. So I think something like this should work:
# get the SQL of both statements
inbox_sql = a.mailbox.inbux.where('conversations.id IN (?)', active_conversations).to_sql
sentbox_sql = a.mailbox.sentbox.to_sql
# use both statements in a UNION SQL statement issued on the Conversation class
Conversation.from("#{inbox_sql} UNION #{sentbox_sql} ORDER BY id AS conversations")

inserting variable into complex sql command

I have a set-up with multiple contests and objects. They are tied together with a has_many :through arrangement with contest_objs. contest_objs also has votes so I can have several contests including several objects. I have a complex SQL setup to calculate the current ranking. However, I need to specify the contest in the SQL select statement for the ranking. I am having difficulty doing this. This is what I got so far:
#objects = #contest.objects.select('"contest_objs"."votes" AS v, name, "objects"."id" AS id,
(SELECT COUNT(DISTINCT "oi"."object_id")
FROM contest_objs oi
WHERE ("oi"."votes") > ("contest_objs"."votes"))+1 AS vrank')
Is there any way in the selection of vrank to specify that WHERE also includes "oi"."contest_id" = #contest.id ?
Since #contest.id is an integer and does not present any risk of an SQL Injection, you could do the following using string interpolation :
Model.select("..... WHERE id = #{#contest.id}")
Another possible solution would be to build your subquery using ActiveRecord, and then call .to_sql in order to get the generated SQL, and insert it in your main query.
Use sanitize_sql_array:
sanitize_sql_array('select ? from foo', 'bar')
If you're outside a model, because the method is protected you have to do this:
ActiveRecord::Base.send(:sanitize_sql_array, ['select ? from foo', 'bar'])
http://apidock.com/rails/ActiveRecord/Sanitization/ClassMethods/sanitize_sql_array
You can insert variables into sql commands like this:
Model.select("...... WHERE id = ?", #contest.id)
Rails will escape the values for you.
Edit:
This does not work as stated by Intrepidd in the comments, use string interpolation like he suggested in his answer. That is safe for integer parameters.
If you find yourself inserting several strings in a query, you could consider using find_by_sql, which gives you the above mentioned ? replacement, but you can't use it with chaining, so rewriting the whole query would be needed.

performing activerecord query on manualy created array of models

In Rails 3 I can perform query on associated models:
EXAMPLE 1:
model.associated_models.where(:attribute => 1)
associated_models is an array of models.
Is it possible to perform activerecord query on manualy created array of models?
EXAMPLE 2:
[Model.create!(attribute: 1), Model.create!(attribute: 2)].where(:attribute => 1)
Just like associated_models in first example its and array of models, but I guess there is something going on backstage when calling associated_models.
Can I simmulate this behaviour to get example 2 working?
short answer is no, you cannot. Activerecord scope chains construct queries for the db and this cannot be interpreted for arbitrary arrays, even if it is array of AR objects like in your example.
You can 'simulate' it by either a proper db scope
Model.where(:id => array_of_ar_objects.map(&:id), :attribute => 1)
(but this is wrong, since you want to do db calls only if needed) or by using array search:
array_of_ar_objects.select { |model| model.attribute == 1 }
Also note that model.associated_models is not an Array, but a ActiveRecord::Associations::HasManyAssociation, a kind of association proxy. It is quite tricky cause even its 'class' method is delegated to the array it is coerced to, this is why you were misled I guess.
model.associated_models.class == Array
-> true
I'd recommend Array#keep_if for this task rather than, squeezing an array into an an ActiveRecord::Relation.
[Model.create!(attribute: 1), Model.create!(attribute: 2)].keep_if { |m| m.attribute == 1 }
http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-keep_if
(Note, Array#select! does the same thing, but I prefer keep_if to avoid confusion when reading it later, thinking that it might be related to an sql select)

Rails: select unique values from a column

I already have a working solution, but I would really like to know why this doesn't work:
ratings = Model.select(:rating).uniq
ratings.each { |r| puts r.rating }
It selects, but don't print unique values, it prints all values, including the duplicates. And it's in the documentation: http://guides.rubyonrails.org/active_record_querying.html#selecting-specific-fields
Model.select(:rating)
The result of this is a collection of Model objects. Not plain ratings. And from uniq's point of view, they are completely different. You can use this:
Model.select(:rating).map(&:rating).uniq
or this (most efficient):
Model.uniq.pluck(:rating)
Rails 5+
Model.distinct.pluck(:rating)
Update
Apparently, as of rails 5.0.0.1, it works only on "top level" queries, like above. Doesn't work on collection proxies ("has_many" relations, for example).
Address.distinct.pluck(:city) # => ['Moscow']
user.addresses.distinct.pluck(:city) # => ['Moscow', 'Moscow', 'Moscow']
In this case, deduplicate after the query
user.addresses.pluck(:city).uniq # => ['Moscow']
If you're going to use Model.select, then you might as well just use DISTINCT, as it will return only the unique values. This is better because it means it returns less rows and should be slightly faster than returning a number of rows and then telling Rails to pick the unique values.
Model.select('DISTINCT rating')
Of course, this is provided your database understands the DISTINCT keyword, and most should.
This works too.
Model.pluck("DISTINCT rating")
If you want to also select extra fields:
Model.select('DISTINCT ON (models.ratings) models.ratings, models.id').map { |m| [m.id, m.ratings] }
Model.uniq.pluck(:rating)
# SELECT DISTINCT "models"."rating" FROM "models"
This has the advantages of not using sql strings and not instantiating models
Model.select(:rating).uniq
This code works as 'DISTINCT' (not as Array#uniq) since rails 3.2
Model.select(:rating).distinct
Another way to collect uniq columns with sql:
Model.group(:rating).pluck(:rating)
If I am going right to way then :
Current query
Model.select(:rating)
is returning array of object and you have written query
Model.select(:rating).uniq
uniq is applied on array of object and each object have unique id. uniq is performing its job correctly because each object in array is uniq.
There are many way to select distinct rating :
Model.select('distinct rating').map(&:rating)
or
Model.select('distinct rating').collect(&:rating)
or
Model.select(:rating).map(&:rating).uniq
or
Model.select(:name).collect(&:rating).uniq
One more thing, first and second query : find distinct data by SQL query.
These queries will considered "london" and "london " same means it will neglect to space, that's why it will select 'london' one time in your query result.
Third and forth query:
find data by SQL query and for distinct data applied ruby uniq mehtod.
these queries will considered "london" and "london " different, that's why it will select 'london' and 'london ' both in your query result.
please prefer to attached image for more understanding and have a look on "Toured / Awaiting RFP".
If anyone is looking for the same with Mongoid, that is
Model.distinct(:rating)
Some answers don't take into account the OP wants a array of values
Other answers don't work well if your Model has thousands of records
That said, I think a good answer is:
Model.uniq.select(:ratings).map(&:ratings)
=> "SELECT DISTINCT ratings FROM `models` "
Because, first you generate a array of Model (with diminished size because of the select), then you extract the only attribute those selected models have (ratings)
You can use the following Gem: active_record_distinct_on
Model.distinct_on(:rating)
Yields the following query:
SELECT DISTINCT ON ( "models"."rating" ) "models".* FROM "models"
In my scenario, I wanted a list of distinct names after ordering them by their creation date, applying offset and limit. Basically a combination of ORDER BY, DISTINCT ON
All you need to do is put DISTINCT ON inside the pluck method, like follow
Model.order("name, created_at DESC").offset(0).limit(10).pluck("DISTINCT ON (name) name")
This would return back an array of distinct names.
Model.pluck("DISTINCT column_name")

Rails where condition using NOT NIL

Using the rails 3 style how would I write the opposite of:
Foo.includes(:bar).where(:bars=>{:id=>nil})
I want to find where id is NOT nil. I tried:
Foo.includes(:bar).where(:bars=>{:id=>!nil}).to_sql
But that returns:
=> "SELECT \"foos\".* FROM \"foos\" WHERE (\"bars\".\"id\" = 1)"
That's definitely not what I need, and almost seems like a bug in ARel.
Rails 4+
ActiveRecord 4.0 and above adds where.not so you can do this:
Foo.includes(:bar).where.not('bars.id' => nil)
Foo.includes(:bar).where.not(bars: { id: nil })
When working with scopes between tables, I prefer to leverage merge so that I can use existing scopes more easily.
Foo.includes(:bar).merge(Bar.where.not(id: nil))
Also, since includes does not always choose a join strategy, you should use references here as well, otherwise you may end up with invalid SQL.
Foo.includes(:bar)
.references(:bar)
.merge(Bar.where.not(id: nil))
Rails 3
The canonical way to do this with Rails 3:
Foo.includes(:bar).where("bars.id IS NOT NULL")
It's not a bug in ARel, it's a bug in your logic.
What you want here is:
Foo.includes(:bar).where(Bar.arel_table[:id].not_eq(nil))
Not sure of this is helpful but this what worked for me in Rails 4
Foo.where.not(bar: nil)
For Rails4:
So, what you're wanting is an inner join, so you really should just use the joins predicate:
Foo.joins(:bar)
Select * from Foo Inner Join Bars ...
But, for the record, if you want a "NOT NULL" condition simply use the not predicate:
Foo.includes(:bar).where.not(bars: {id: nil})
Select * from Foo Left Outer Join Bars on .. WHERE bars.id IS NOT NULL
Note that this syntax reports a deprecation (it talks about a string SQL snippet, but I guess the hash condition is changed to string in the parser?), so be sure to add the references to the end:
Foo.includes(:bar).where.not(bars: {id: nil}).references(:bar)
DEPRECATION WARNING: It looks like you are eager loading table(s) (one
of: ....) that are referenced in a string SQL snippet. For example:
Post.includes(:comments).where("comments.title = 'foo'")
Currently, Active Record recognizes the table in the string, and knows
to JOIN the comments table to the query, rather than loading comments
in a separate query. However, doing this without writing a full-blown
SQL parser is inherently flawed. Since we don't want to write an SQL
parser, we are removing this functionality. From now on, you must
explicitly tell Active Record when you are referencing a table from a
string:
Post.includes(:comments).where("comments.title = 'foo'").references(:comments)
With Rails 4 it's easy:
Foo.includes(:bar).where.not(bars: {id: nil})
See also:
http://guides.rubyonrails.org/active_record_querying.html#not-conditions

Resources