ActiveRecord: Filtering duplicates

ActiveRecord: Filtering duplicates - ruby-on-rails

Given this table:
Users
| id | name | active |
| 1 | bob | true |
| 2 | bob | false |
| 3 | alice | false |
How can i query this table using ActiveRecord (Rails 4.2, PostgreSQL), if the resulting relation should
have all attributes populated
not contain duplicate names
prefer active records in favor to inactive ones
be capable of calling .count, where count returns an integer
remain an ActiveRecord::Relation
The correct result set should look like this:
Users
| id | name | active |
| 1 | bob | true |
| 3 | alice | false |
What i tried so far:
# Works as for the result set, but raises when calling .count
User.select('DISTINCT ON (users.name) *')
.order(users.name, users.active DESC')

This should work
User.select('DISTINCT name').order('name, active DESC').count

Maybe
User.where.not(name: nil).where.not(active: nil)
This returns all users with a populated name and active boolean (ids will always be populated)
Then on that we call
.uniq_by(:name)
This gives us only records with a unique name
And finally
.sort_by { |a| a.active ? 1 : 0 }
Puts records with active: true at the start. So the method is:
User.where.not(name: nil).where.not(active: nil).uniq_by(:name).sort_by { |a| a.active ? 1 : 0 }
On this query we can call count to get the number of objects.
EDIT:
To keep as AR change uniq_by to uniq:
User.where.not(name: nil).where.not(active: nil).uniq(name).sort_by { |a| a.active ? 1 : 0 }

Since you're using PostgreSQL, you can use DISTINCT ON. Here's the SQL:
select distinct on (name) * from users order by name, active desc
And in Active Record:
>> User.order(name: :desc, active: :desc).select("distinct on (name) *")
User Load (2.5ms) SELECT distinct on (name) * FROM "users" ORDER BY "users"."name" DESC, "users"."active" DESC LIMIT $1 [["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<User id: 2, name: "bob", active: true, created_at: "2017-09-15 03:58:23", updated_at: "2017-09-15 03:58:23">, #<User id: 3, name: "alice", active: false, created_at: "2017-09-15 03:58:35", updated_at: "2017-09-15 03:58:35">]>

Related

How to access ActiveRecords associated Objects

I have two tables, qip_changes and mac_addresses, which have a belongs_to relation. When i try to access the mac_address table from a QipChange class object, i run into "undefined method `MyDbColumName=' for nil:NilClass"
i can fetch my data, but iam not able to write into the associated table:
2.4.1 :044 > i = QipChange.first
QipChange Load (0.5ms) SELECT "qip_changes".* FROM "qip_changes" ORDER BY
"qip_changes"."id" ASC LIMIT $1 [["LIMIT", 1]]
=> #<QipChange id: 2, created_at: "2018-08-21 08:31:48", updated_at:
"2018-08-21 08:31:48", tenant: "BMC-Test", object_type: "Subnet", action:
"add", object_data: "blabalalaalla", implementation_status: "started",
server_response: "", user_id: 1, user_cache: "test">
2.4.1 :045 > i.mac_address
MacAddress Load (0.6ms) SELECT "mac_addresses".* FROM "mac_addresses"
WHERE "mac_addresses"."qip_change_id" = $1 LIMIT $2 [["qip_change_id", 2],
["LIMIT", 1]]
=> nil
2.4.1 :046 > i.mac_address.mac_address = "asocnasc"
NoMethodError: undefined method `mac_address=' for nil:NilClass
from (irb):46
I guess iam just using the wrong syntax to access the associated table, but i cannot find something on the web how to do so.
my Models:
class QipChange < ApplicationRecord
belongs_to :user
has_one :mac_address
end
class MacAddress < ApplicationRecord
belongs_to :qip_change
end
from psql:
ipmatedevel=# \d mac_addresses
Table "public.mac_addresses"
Column | Type | Modifiers
---------------+-----------------------------+------------------------------------------------------------
id | bigint | not null default nextval('mac_addresses_id_seq'::regclass)
mac_address | character varying |
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
qip_change_id | bigint |
Indexes:
"mac_addresses_pkey" PRIMARY KEY, btree (id)
"index_mac_addresses_on_qip_change_id" btree (qip_change_id)
Foreign-key constraints:
"fk_rails_8002186396" FOREIGN KEY (qip_change_id) REFERENCES qip_changes(id)
qipmatedevel=# \d qip_changes
Table "public.qip_changes"
Column | Type | Modifiers
-----------------------+-----------------------------+----------------------------------------------------------
id | bigint | not null default nextval('qip_changes_id_seq'::regclass)
created_at | timestamp without time zone | not null
updated_at | timestamp without time zone | not null
tenant | character varying |
object_type | character varying |
action | character varying |
object_data | character varying |
implementation_status | character varying |
server_response | character varying |
user_id | bigint |
user_cache | character varying | not null
Indexes:
"qip_changes_pkey" PRIMARY KEY, btree (id)
"index_qip_changes_on_user_id" btree (user_id)
Foreign-key constraints:
"fk_rails_3207a22986" FOREIGN KEY (user_id) REFERENCES users(id)
Referenced by:
TABLE "mac_addresses" CONSTRAINT "fk_rails_8002186396" FOREIGN KEY (qip_change_id) REFERENCES qip_changes(id)
edit: so iam curios if my association is working. Since a query is produced once i enter "i.mac_address" it seems that this is working, am i right ?
edit:
i.create_mac_address(mac_address: "aconasic")
that worked to create the mac_address. What do i need to do to access it starting from i ?
works with the following once the mac_address has been created:
i.mac_address.mac_address

If you want to update an attribute in rails you need to do
TABLE_NAME.update(ATTRIBUTENAME: "")
For example If you want to update the mac_address the you need to do
i.mac_address.update(mac_address: "asocnasc")
And with the query you have posted it donot seem like there is a association between those two tables
If you want to create a new mac_address for a qipChange them you need to do
i.create_mac_address(mac_address: MAC_ADDRESSS) here i is your QIPCHANGE

Select unique record with latest created date

| id | user_id | created_at (datetime) |
| 1 | 1 | 17 May 2016 10:31:34 |
| 2 | 1 | 17 May 2016 12:41:54 |
| 3 | 2 | 18 May 2016 01:13:57 |
| 4 | 1 | 19 May 2016 07:21:24 |
| 5 | 2 | 20 May 2016 11:23:21 |
| 6 | 1 | 21 May 2016 03:41:29 |
How can I get the result of unique and latest created_at user_id record, which will be record id 5 and 6 in the above case?
What I have tried so far
So far I am trying to use group_by to return a hash like this:
Table.all.group_by(&:user_id)
#{1 => [record 1, record 2, record 4, record 6], etc}
And select the record with maximum date from it? Thanks.
Updated solution
Thanks to Gordon answer, I am using find_by_sql to use raw sql query in ror.
#table = Table.find_by_sql("Select distinct on (user_id) *
From tables
Order by user_id, created_at desc")
#To include eager loading with find_by_sql, we can add this
ActiveRecord::Associations::Preloader.new.preload(#table, :user)

In Postrgres, you can use DISTINCT ON:
SELECT DISTINCT ON (user_id) *
FROM tables
ORDER BY user_id, created_at DESC;
I am not sure how to express this in ruby.

Table
.select('user_id, MAX(created_at) AS created_at')
.group(:user_id)
.order('created_at DESC')
Notice created_at is passed in as string in order call, since it's a result of aggregate function, not a column value.

1) Extract unique users form the table
Table.all.uniq(:user_id)
2) Find all records of each user.
Table.all.uniq(:user_id).each {|_user_id| Table.where(user_id: _user_id)}
3) Select the latest created
Table.all.uniq(:user_id).each {|_user_id| Table.where(user_id: _user_id).order(:created_at).last.created_at}
4) Return result in form of: [[id, user_id], [id, user_id] ... ]
Table.all.uniq(:user_id).map{|_user_id| [Table.where(user_id: _user_id).order(:created_at).last.id, _user_id]}
This should return [[6,1], [2,5]]

How to find posts tagged with more than one tag in Rails and Postgresql

I have the models Post, Tag, and PostTag. A post has many tags through post tags. I want to find posts that are exclusively tagged with more than one tag.
has_many :post_tags
has_many :tags, through: :post_tags
For example, given this data set:
posts table
--------------------
id | title |
--------------------
1 | Carb overload |
2 | Heart burn |
3 | Nice n Light |
tags table
-------------
id | name |
-------------
1 | tomato |
2 | potato |
3 | basil |
4 | rice |
post_tags table
-----------------------
id | post_id | tag_id |
-----------------------
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
5 | 3 | 1 |
I want to find posts tagged with tomato AND basil. This should return only the "Heart burn" post (id 2). Likewise, if I query for posts tagged with tomato AND potato, it should return the "Carb overload" post (id 1).
I tried the following:
Post.joins(:tags).where(tags: { name: ['basil', 'tomato'] })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" IN ('basil', 'tomato')
This returns all three posts because all share the tag tomato. I also tried this:
Post.joins(:tags).where(tags: { name 'basil' }).where(tags: { name 'tomato' })
SQL
SELECT "posts".* FROM "posts"
INNER JOIN "post_tags" ON "post_tags"."post_id" = "posts"."id"
INNER JOIN "tags" ON "tags"."id" = "post_tags"."tag_id"
WHERE "tags"."name" = 'basil' AND "tags"."name" = 'tomato'
This returns no records.
How can I query for posts tagged with multiple tags?

You may want to review the possible ways to write this kind of query in this answer for applying conditions to multiple rows in a join. Here is one possible option for implementing your query in Rails using 1B, the sub-query approach...
Define a query in the PostTag model that will grab up the Post ID values for a given Tag name:
# PostTag.rb
def self.post_ids_for_tag(tag_name)
joins(:tag).where(tags: { name: tag_name }).select(:post_id)
end
Define a query in the Post model that will grab up the Post records for a given Tag name, using a sub-query structure:
# Post.rb
def self.for_tag(tag_name)
where("id IN (#{PostTag.post_ids_for_tag(tag_name).to_sql})")
end
Then you can use a query like this:
Post.for_tag("basil").for_tag("tomato")

Use method .includes, like this:
Item.where(xpto: "test")
.includes({:orders =>[:suppliers, :agents]}, :manufacturers)
Documentation to .includes here.

How to count group by rows in rails?

When I use User.count(:all, :group => "name"), I get multiple rows, but it's not what I want. What I want is the count of the rows. How can I get it?

Currently (18.03.2014 - Rails 4.0.3) this is correct syntax:
Model.group("field_name").count
It returns hash with counts as values
e.g.
SurveyReport.find(30).reports.group("status").count
#=> {
"pdf_generated" => 56
}

User.count will give you the total number of users and translates to the following SQL: SELECT count(*) AS count_all FROM "users"
User.count(:all, :group => 'name') will give you the list of unique names, along with their counts, and translates to this SQL: SELECT count(*) AS count_all, name AS name FROM "users" GROUP BY name
I suspect you want option 1 above, but I'm not clear on what exactly you want/need.

Probably you want to count the distinct name of the user?
User.count(:name, :distinct => true)
would return 3 if you have user with name John, John, Jane, Joey (for example) in the database.
________
| name |
|--------|
| John |
| John |
| Jane |
| Joey |
|________|

Try using User.find(:all, :group => "name").count
Good luck!

I found an odd way that seems to work. To count the rows returned from the grouping counts.
User Table Example
________
| name |
|--------|
| Bob |
| Bob |
| Joe |
| Susan |
|________|
Counts in the Groups
User.group(:name).count
# SELECT COUNT(*) AS count_all
# FROM "users"
# GROUP BY "users"."name"
=> {
"Bob" => 2,
"Joe" => 1,
"Susan" => 1
}
Row Count from the Counts in the Groups
User.group(:name).count.count
=> 5
Something Hacky
Here's something interesting I ran into, but it's quite hacky as it will add the count to every row, and doesn't play too well in active record land. I don't remember if I was able to get this into an Arel / ActiveRecord query.
SELECT COUNT(*) OVER() AS count, COUNT(*) AS count_all
FROM "users"
GROUP BY "users"."name"
[
{ count: 3, count_all: 2, name: "Bob" },
{ count: 3, count_all: 1, name: "Joe" },
{ count: 3, count_all: 1, name: "Susan" }
]

Does it make sense to convert DB-ish queries into Rails ActiveRecord Model lingo?

mysql> desc categories;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(80) | YES | | NULL | |
+-------+-------------+------+-----+---------+----------------+
mysql> desc expenses;
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| created_at | datetime | NO | | NULL | |
| description | varchar(100) | NO | | NULL | |
| amount | decimal(10,2) | NO | | NULL | |
| category_id | int(11) | NO | MUL | 1 | |
+-------------+---------------+------+-----+---------+----------------+
Now I need the top N categories like this...
Expense.find_by_sql("SELECT categories.name, sum(amount) as total_amount
from expenses
join categories on category_id = categories.id
group by category_id
order by total_amount desc")
But this is nagging at my Rails conscience.. it seems that it may be possible to achieve the same thing via Expense.find and supplying options like :group, :joins..
Can someone translate this query into ActiveRecord Model speak ?
Is it worth it... Personally i find the SQL more readable and gets my job done faster.. maybe coz I'm still learning Rails. Any advantages with not embedding SQL in source code (apart from not being able to change DB vendors..sql flavor, etc.)?
Seems like find_by_sql doesn't have the bind variable provision like find. What is the workaround? e.g. if i want to limit the number of records to a user-specified limit.

Expense.find(:all,
:select => "categories.name name, sum(amount) total_amount",
:joins => "categories on category_id = categories.id",
:group => "category_id",
:order => "total_amount desc")
Hope that helps!

Seems like find_by_sql doesn't have the bind variable provision like find.
It sure does. (from the Rails docs)
# You can use the same string replacement techniques as you can with ActiveRecord#find
Post.find_by_sql ["SELECT title FROM posts WHERE author = ? AND created > ?", author_id, start_date]

Well this is the code that finally worked for me.. (Francois.. the resulting sql stmt was missing the join keyword)
def Expense.get_top_n_categories options={}
#sQuery = "SELECT categories.name, sum(amount) as total_amount
# from expenses
# join categories on category_id = categories.id
# group by category_id
# order by total_amount desc";
#sQuery += " limit #{options[:limit].to_i}" if !options[:limit].nil?
#Expense.find_by_sql(sQuery)
query_options = {:select => "categories.name name, sum(amount) total_amount",
:joins => "inner join categories on category_id = categories.id",
:group => "category_id",
:order => "total_amount desc"}
query_options[:limit] = options[:limit].to_i if !options[:limit].nil?
Expense.find(:all, query_options)
end
find_by_sql does have rails bind variable... I don't know how I overlooked that.
Finally is the above use of user-specified a potential entry point for sql-injection or does the to_i method call prevent that?
Thanks for all the help. I'm grateful.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

ActiveRecord: Filtering duplicates - ruby-on-rails

This should work User.select('DISTINCT name').order('name, active DESC').count

Related

How to access ActiveRecords associated Objects

Select unique record with latest created date

How to find posts tagged with more than one tag in Rails and Postgresql

How to count group by rows in rails?

Does it make sense to convert DB-ish queries into Rails ActiveRecord Model lingo?

Categories

Resources