Rails remove duplicated associated records

Rails remove duplicated associated records - ruby-on-rails

Let's say I have a User and User has_many :tags and I would like to remove all #users tags that have duplicated name. For example,
#user.tags #=> [<Tag name: 'A'>, <Tag name: 'A'>, <Tag name: 'B'>]
I would like to keep only the tags with unique names and delete the rest from the database.
I know I could pull out a list of unique tags names from user's tags and remove all users's tags and re-create user's tags with only unique names but it would be ineffficient?
On the other hand, select won't work as it returns only the selected column. uniq also won't work:
#user.tags.uniq #=> returns all tags
Is there a more efficient way?
UPDATE:
I would like to do this in a migration.

This method will give you an ActiveRecord::Relation with the duplicate tags:
class Tag < ApplicationRecord
belongs_to :user
def self.duplicate_tags
unique = self.select('DISTINCT ON(tags.name, tags.user_id) tags.id')
.order(:name, :user_id, :id)
self.where.not(id: unique)
end
end
Its actually run as a single query:
SELECT "tags".* FROM "tags"
WHERE "tags"."id" NOT IN
(SELECT DISTINCT ON(tags.name) tags.id
FROM "tags" GROUP BY "tags"."id", "tags"."user_id"
ORDER BY tags.name, tags.id)
You can remove the duplicates in a single query with #delete_all.
# Warning! This can't be undone!
Tag.duplicate_tags.destroy_all
If you need to destroy dependent associations or call your before_* or after_destroy callbacks, use the #destroy_all method instead. But you should use this together with #in_batches to avoid running out of memory.
# Warning! This can't be undone!
Tag.duplicate_tags.in_batches do |batch|
# destroys a batch of 1000 records
batch.destroy_all
end

You can write SQL model-independent query in the migration.
Here is PostgreSQL-specific migration code:
execute <<-SQL
DELETE FROM tags
WHERE id NOT IN (
SELECT DISTINCT ON(user_id, name) id FROM tags
ORDER BY user_id, name, id ASC
)
SQL
And here is more SQL common one:
execute <<-SQL
DELETE FROM tags
WHERE id IN (
SELECT DISTINCT t2.id FROM tags t1
INNER JOIN tags t2
ON (
t1.user_id = t2.user_id AND
t1.name = t2.name AND
t1.id < t2.id
)
)
SQL
This SQL fiddle shows
different queries you can use as sub-select in DELETE query depending on your goals: deleting first/last/all duplicates.

Related

Check if ActiveRecord::Relation alread includes JOIN

I'm inside method that adds filter (user.type) to my query/relation.
Sometimes if grouping by the user (which needs INNER join to users table in another module) is selected before filtering I receive an error:
PostgreSQL: PG::DuplicateAlias: ERROR: table name "users" specified more than once
Before error happen JOIN is already in query -
$ pry> relation.to_sql
SELECT \"posts\".* FROM \"posts\"
INNER JOIN users ON users.id = posts.user_id
WHERE \"posts\".\"created_at\" BETWEEN '2019-05-01 00:00:00'
AND '2020-05-01 23:59:59' AND \"users\".\"type\" = 'Guest'"
I wanna fix it, by checking if the table is already joined inside my ActiveRecord::Relation object. I added:
def join_users
return relation if /JOIN users/.match? relation.to_sql
relation.joins('LEFT JOIN users ON users.id = posts.user_id')
end
This solution works, but I wonder - is there any better way to check if JOIN is inside relation?

Perhaps you can use joins_values, which isn't documented, but is an ActiveRecord_Relation public method that returns an array containing the name of the table the current query (object) is constructed with:
Post.joins(:user).joins_values # [:user]
Post.all.joins_values # []

if simple join
Post.joins(:user)
you can find via joins_values
so it will look like Post.joins(:user).joins_values # [:user]
if post has left joins
Post.left_joins(:user)
you can find via left_outer_joins_values
So in this case if you write Post.joins(:user).joins_values # []
so you can fix it by writing Post.joins(:user).left_outer_joins_values # [:user]

StatementInvalid Rails Query

I've got the following query that works:
jobs = current_location.jobs.includes(:customer).all.where(complete: complete)
However, when I add a where clause to query the first name of the customer table, I get an error.
jobs = current_location.jobs.includes(:customer).all.where(complete: complete).where("customers.fist_name = ?", "Bob")
Here is the error:
PG::UndefinedTable: ERROR: missing FROM-clause entry for table "customers"
LINE 1: ...bs"."complete" = $2 AND "jobs"."status" = $3 AND (customers....
^
: SELECT "jobs".* FROM "jobs" INNER JOIN "jobs_users" ON "jobs"."id" = "jobs_users"."job_id" WHERE "jobs_users"."user_id" = $1 AND "jobs"."complete" = $2 AND "jobs"."status" = $3 AND (customers.last_name = 'Bob') ORDER BY "jobs"."start" DESC LIMIT $4 OFFSET $5
The current_location method:
def current_location
return current_user.locations.find_by(id: cookies[:current_location])
end
Location Model
has_many :jobs
has_and_belongs_to_many :customers
Job Model
belongs_to :location
belongs_to :customer
Customer Model
has_many :jobs
has_and_belongs_to_many :locations
How can I fix this issue?

includes will only join the table if you set a reference to the association.
When using includes you ensure a reference to the association in 2 fashions:
You can use the references method this will join the table whether or not there are any query conditions (If you MUST use raw SQL as shown in your question then this is the method you would need to use) e.g.
current_location.jobs
.includes(:customer)
.references(:customer)
Or you can use the hash finder version of where: (Please note that when using an associative reference in the where clause you must reference the table name, in this case customers and not the association name customer)
current_location.jobs
.includes(:customer)
.where(customers: {first_name: "Bob" })
Both of these will eager load the customer for the jobs referenced.
The first option (references) will OUTER JOIN the customers table so that all the jobs are loaded even if they have no customers as long as no query conditions reference the customers table.
The second option (using where) will OUTER JOIN the customers table but given the query parameter against the customers table it will act very much like an INNER JOIN.
If you only need to search the jobs based on customer information then joins is a better choice as this will create an INNER JOIN with the customers table but will not try to load any of the customer data in the query e.g.
current_location.jobs.joins(:customer).where(customers: {first_name: "Bob" })
joins will always include the associated table regardless of a reference in the query.
Sidenote: the all in both your queries is completely unnecessary

includes(:customer) does not necessarily join the customers table into the SQL query. You need to use joins(:customer) to force Rails to join the customers table into the SQL query and make it available to query conditions.
jobs = current_location.jobs
.joins(:customer)
.includes(:customer)
.where(complete: complete)
.where(customers: { first_name: 'Bob' })

How to write query in active record to select from two or more tables in rails 3

I don't want to use join
I want to manually compare any field with other table field
for example
SELECT u.user_id, t.task_id
FROM tasks t, users u
WHERE u.user_id = t.user_id
how can i write this query in Rails ??

Assuming you have associations in your models, you can simply do as follow
User.joins(:tasks).select('users.user_id, tasks.task_id')
you can also do as follow
User.includes(:tasks).where("user.id =tasks.user_id")
includes will do eager loading check the example below or read eager loading at here
users = User.limit(10)
users.each do |user|
puts user.address.postcode
end
This will run 11 queries, it is called N+1 query problem(first you query to get all the rows then you query on each row again to do something). with includes Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.
Now when you do;
users = User.includes(:address).limit(10)
user.each do |user|
puts user.address.postcode
end
It will generate just 2 queries as follow
SELECT * FROM users LIMIT 10
SELECT addresses.* FROM addresses
WHERE (addresses.user_id IN (1,2,3,4,5,6,7,8,9,10))
Plus if you don't have associations then read below;
you should be have to look at http://guides.rubyonrails.org/association_basics.html
Assuming your are trying to do inner join, by default in rails when we associate two models and then query on them then we are doing inner join on those tables.
You have to create associations between the models example is given below
class User
has_many :reservations
...# your code
end
And in reservations
class Reservations
belongs_to :user
... #your code
end
Now when you do
User.joins(:reservations)
the generated query would look like as follow
"SELECT `users`.* FROM `users` INNER JOIN `reservations` ON `reservations`.`user_id` = `users`.`id`"
you can check the query by doing User.joins(:reservations).to_sql in terminal
Hopefully it would answer your question

User.find_by_sql("YOUR SQL QUERY HERE")

You can use as follows..
User.includes(:tasks).where("user.id =tasks.user_id").order(:user.id)

Rails: How to sort many-to-many relation

I have a many-to-many relationship between a model User and Picture. These are linked by a join table called Picturization.
If I obtain a list of users of a single picture, i.e. picture.users -> how can I ensure that the result obtained is sorted by either creation of the Picturization row (i.e. the order at which a picture was associated to a user). How would this change if I wanted to obtain this in order of modification?
Thanks!
Edit
Maybe something like
picture.users.where(:order => "created_at")
but this created_at refers to the created_at in picturization

Have an additional column something like sequence in picturization table and define sort order as default scope in your Picturization
default_scope :order => 'sequence ASC'
If you want default sort order based on modified_at then use following default scope
default_scope :order => 'modified_at DESC'

You can specify the table name in the order method/clause:
picture.users.order("picturizations.created_at DESC")

Well, in my case, I need to sort many-to-many relation by a column named weight in the middle-table. After hours of trying, I figured out two solutions to sort many-to-many relation.
Solution1: In Rails Way
picture.users.where(:order => "created_at")
cannot return a ActiveRecord::Relation sorted by Picturization's created_at column.
I have tried to rewrite a default_scope method in Picturization, but it does not work:
def self.default_scope
return Picturization.all.order(weight: :desc)
end
Instead, first, you need to get the ids of sorted Picturization:
ids = Picturization.where(user_id:user.id).order(created_at: :desc).ids
Then, you can get the sorted objects by using MySQL field functin
picture.users.order("field(picturizations.id, #{ids.join(",")})")
which generates SQL looks like this:
SELECT `users`.*
FROM `pictures` INNER JOIN `picturizations`
ON `pictures`.`id` = `picturizations`.`picture_id`
WHERE `picturizations`.`user_id = 1#for instance
ORDER BY field(picturizations.id, 9,18,6,8,7)#for instance
Solution2: In raw SQL Way
you can get the answer directly by using an order by function:
SELECT `users`.*
FROM `pictures` INNER JOIN `picturizations`
ON `pictures`.`id` = `picturizations`.`picture_id`
WHERE `picturizations`.`user_id = 1
order by picturizations.created_at desc

How do I get Rails ActiveRecord to generate optimized SQL?

Let's say that I have 4 models which are related in the following ways:
Schedule has foreign key to Project
Schedule has foreign key to User
Project has foreign key to Client
In my Schedule#index view I want the most optimized SQL so that I can display links to the Schedule's associated Project, Client, and User. So, I should not pull all of the columns for the Project, Client, and User; only their IDs and Name.
If I were to manually write the SQL it might look like this:
select
s.id,
s.schedule_name,
s.schedule_type,
s.project_id,
p.name project_name,
p.client_id client_id,
c.name client_name,
s.user_id,
u.login user_login,
s.created_at,
s.updated_at,
s.data_count
from
Users u inner join
Clients c inner join
Schedules s inner join
Projects p
on p.id = s.project_id
on c.id = p.client_id
on u.id = s.user_id
order by
s.created_at desc
My question is: What would the ActiveRecord code look like to get Rails 3 to generate that SQL? For example, somthing like:
#schedules = Schedule. # ?
I already have the associations setup in the models (i.e. has_many / belongs_to).

I think this will build (or at least help) you get what you're looking for:
Schedule.select("schedules.id, schedules.schedule_name, projects.name as project_name").joins(:user, :project=>:client).order("schedules.created_at DESC")
should yield:
SELECT schedules.id, schedules.schedule_name, projects.name as project_name FROM `schedules` INNER JOIN `users` ON `users`.`id` = `schedules`.`user_id` INNER JOIN `projects` ON `projects`.`id` = `schedules`.`project_id` INNER JOIN `clients` ON `clients`.`id` = `projects`.`client_id`
The main problem I see in your approach is that you're looking for schedule objects but basing your initial "FROM" clause on "User" and your associations given are also on Schedule, so I built this solution based on the plain assumption that you want schedules!
I also didn't include all of your selects to save some typing, but you get the idea. You will simply have to add each one qualified with its full table name.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Rails remove duplicated associated records - ruby-on-rails

Related

Check if ActiveRecord::Relation alread includes JOIN

StatementInvalid Rails Query

How to write query in active record to select from two or more tables in rails 3

Rails: How to sort many-to-many relation

How do I get Rails ActiveRecord to generate optimized SQL?

Categories

Resources