Remove duplicated records keeping last usign ActiveRecord - ruby-on-rails

I've been trying to remove the records that are duplicated (same value in the column shopify_order_id) keeping the most recent one.
I wrote it in sql:
select orders.id from (
select shopify_order_id, min(shopify_created_at) as min_created
from orders group by shopify_order_id having count(*) > 1 limit 5000
) as keep_orders
join orders
on
keep_orders.shopify_order_id = orders.shopify_order_id and
orders.shopify_created_at <> keep_orders.min_created
and now I'm trying to get it to Active Record but can't seem to join the two parts.
The first nested select is
Order.select('shopify_order_id, MIN(shopify_created_at) as min_created').
group(:shopify_order_id).
having('count(*) > 1').
limit(5000)
but then the following doesn't work:
Order.select('orders.id').from(keep_orders, :keep_orders).
joins('orders ON keep_orders.shopify_order_id = orders.shopify_order_id').
where.not('orders.shopify_created_at = keep_orders.min_created')
it builds the query:
SELECT orders.id FROM (SELECT shopify_order_id, MIN(shopify_created_at) as min_created FROM "orders" GROUP BY "orders"."shopify_order_id" HAVING (count(*) > 1) LIMIT $1) keep_orders orders ON keep_orders.shopify_order_id = orders.shopify_order_id WHERE NOT (orders.shopify_created_at = keep_orders.min_created) ORDER BY "orders"."id" ASC LIMIT $2 [["LIMIT", 5000], ["LIMIT", 1]]
which is missing the keyword join.
Any help on how to refactor the query/do it in another way would be more than appreciated.

If you call joins with a string SQL fragment you need to specify the type of join you want:
Order.select('orders.id').from(keep_orders, :keep_orders)
.joins('JOIN orders ON keep_orders.shopify_order_id = orders.shopify_order_id')
.where.not('orders.shopify_created_at = keep_orders.min_created')

Related

How to apply sorting asc desc on 3rd level association using ranksack in Ruby

I am using ransack gem for searching which's working fine but I have search creteria in which I want to search order based on products title asc or desc in line items.
It does not work and returns:
ActionView::Template::Error (PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...wner_id" = $1 ORDER BY "orders"."created_at" ASC, lower(prod...
Parmas:
Parameters: {"filter_type"=>"",
"csfname"=>"line_items_product_title_or_order_detail_name_or_shipping_address_postal_code
_or_billing_address_postal_code_or_line_items_product_sku_or_line_items_product_yan_or_line_items_product_ean_or_id_cont", "q"=>{"s"=>"[\"created_at
asc\",\"line_items_product_title desc\"]",
"line_items_product_title_or_order_detail_name_or_shipping_address_postal_code_or_billing
_address_postal_code_or_line_items_product_sku_or_line_items_product_yan_or_line_items_pr
oduct_ean_or_id_cont"=>""}}
Query:
#q = Order.joins(line_items: :product).where(products: {owner_id: current_seller}).ransack(params[:q])
#seller_orders = #q.result(distinct: true)
SQL:
SELECT DISTINCT "orders".* FROM "orders" INNER JOIN "line_items" ON
"line_items"."order_id" = "orders"."id" INNER JOIN "products" ON "products"."id" =
"line_items"."product_id" LEFT OUTER JOIN "products" "products_line_items" ON
"products_line_items"."id" = "line_items"."product_id" WHERE "products"."owner_id" = $1
ORDER BY "orders"."created_at" ASC, lower(products.title) DESC
This below one works but it shows orders as 5 because 2 orders contains 5 line items in them and so it orders them based on that info.
How we can apply search but make distinct etc so orders would stay only 2?
#q = Order.joins(line_items: :product).where(products: {owner_id: current_seller}).ransack(params[:q])
#seller_orders = #q.result(distinct: true).select('orders.*, lower(products.title)')

Rails Query Multiple Params From Same Table

How can I search for multiple params? I have checkboxes in my view, so if multiple checkboxes are selected, I would like all the params selected to be chosen. I can currently only get the search to work with one param with code below.
There is a has_many to has_many association between car model and colour_collection model.
Controller:
#cars = car.joins(:colour_collections).where("colour_collections.name = ?", params[:colour_collection])
logs show this if two colours selected (e.g. red and green) creating duplicates in the resulting querie:
(0.7ms) SELECT COUNT(*) FROM "colour_collections"
ColourCollection Load (0.5ms) SELECT "colour_collections".* FROM "colour_collections"
Car Load (2.5ms) SELECT "cars".* FROM "cars" INNER JOIN "car_colour_collections" ON "car_colour_collections"."car_id" = "cars"."id" INNER JOIN "colour_collections" ON "colour_collections"."id" = "car_colour_collections"."colour_collection_id" WHERE "colour_collections"."name" IN ('Subtle', 'Intermediate') ORDER BY "cars"."created_at" DESC
CarAttachment Load (0.5ms) SELECT "car_attachments".* FROM "car_attachments" WHERE "car_attachments"."car_id" = $1 ORDER BY "car_attachments"."id" ASC LIMIT $2 [["car_id", 21], ["LIMIT", 1]]
CACHE (0.0ms) SELECT "car_attachments".* FROM "car_attachments" WHERE "car_attachments"."car_id" = $1 ORDER BY "car_attachments"."id" ASC LIMIT $2 [["car_id", 21], ["LIMIT", 1]]
CarAttachment Load (0.5ms) SELECT "car_attachments".* FROM "car_attachments" WHERE "car_attachments"."car_id" = $1 ORDER BY "car_attachments"."id" ASC LIMIT $2 [["car_id", 20], ["LIMIT", 1]]
CACHE (0.0ms) SELECT "car_attachments".* FROM "car_attachments" WHERE "car_attachments"."car_id" = $1 ORDER BY "car_attachments"."id" ASC LIMIT $2 [["car_id", 20], ["LIMIT", 1]]
If you want to search for multiple values in a single column for example
params[:colour_collection] = ['red','green','blue']
Then you would expect your query to look like this
SELECT * FROM cars c
INNER JOIN colour_collections s
WHERE s.name IN ('red','green','blue');
In this case the corresponding ActiveRecord statement would look like this
Car.
joins(:colour_collections).
where(colour_collections: { name: params[:colour_collection] })
Rails 5 comes with an or method but Rails 4 does not have the or method, so you can use plain SQL query in Rails 4.
In Rails 4 :
#cars = car.
joins(:colour_collections).
where("colour_collections.name = ? or colour_collections.type = ?", params[:colour_collection], params[:type])
In Rails 5 :
#cars = car.
joins(:colour_collections).
where("colour_collections.name = ?", params[:colour_collection]).or(car.joins(:colour_collections).where("colour_collections.type = ?", params[:type]))
Depending on whether you want to use OR or AND. There are multiple ways of achieving this but simple example is
Article.where(trashed: true).where(trashed: false)
the sql generated will be
SELECT * FROM articles WHERE 'trashed' = 1 AND 'trashed' = 0
Foo.where(foo: 'bar').or.where(bar: 'bar') This is norm in Rails 5 or simply
Foo.where('foo= ? OR bar= ?', 'bar', 'bar')
#cars = car.joins(:colour_collections).where("colour_collections.name = ?", params[:colour_collection]).where("cars.make = ?", params[:make])
More discussion on chaining How does Rails ActiveRecord chain "where" clauses without multiple queries?

Complex ActiveRecord query comparing datetime through many to many relation

So the objective of my query:
Fetch all of a single user's clients that have not had a meeting since 30 days ago.
A Client has_many :meetings, through: :contacts although contacts isn't very relevant here.
My query is as follows:
user.clients.where(is_dormant: false).joins(:meetings).distinct.where('meetings.actual_start_datetime <= ?', 30.days.ago).where.not('meetings.actual_start_datetime > ?', 30.days.ago)
which produces this SQL:
SELECT DISTINCT "clients".* FROM "clients" INNER JOIN "contacts" ON "contacts"."client_id" = "clients"."id" INNER JOIN "meetings" ON "meetings"."contact_id" = "contacts"."id" INNER JOIN "clients_users" ON "clients"."id" = "clients_users"."client_id" WHERE "clients_users"."user_id" = $1 AND "clients"."is_dormant" = $2 AND (meetings.actual_start_datetime <= '2016-12-31 20:29:08.972999') AND (NOT (meetings.actual_start_datetime > '2016-12-31 20:29:08.973484')) ORDER BY "clients"."name" ASC [["user_id", 1], ["is_dormant", "f"]]
But it seems to just ignore the where.not('meetings.actual_start_datetime > ?', 30.days.ago) clause. If I run the query without that clause, it returns the exact same result.
After many days of deliberating, it seems the easiest way to do this is get all of the clients who have had a meeting 30 or more days ago, then subtract from that array the clients who have had a meeting in the last 30 days, eg:
user_clients.without_recent_meetings - user_clients.with_recent_meetings
Is there any way to do this in one query, as this way means having to run a complex query twice?
Try this one
user.clients.where(is_dormant: false).joins(:meetings).distinct.where('meetings.actual_start_datetime <= ? AND clients.id not in (select m.client_id from meetings m where m.actual_start_datetime>?)', 30.days.ago)

Nested SQL SELECT in Rails 4

I'm looking for a way to generate the following SQL in Rails (to make it a scope), so that I could chain it with further scopes (e.g. Article.published.most_comments):
SELECT *, cs.count
FROM articles, (
SELECT article_id, count(*)
FROM comments
GROUP BY comments.article_id
) cs
WHERE articles.id = cs.article_id
ORDER BY cs.count DESC;
I've tried something along the lines of Article.joins(:comments).select('*').group('comments.article_id'), but that doesn't generate the desired SQL:
SELECT * FROM "articles"
INNER JOIN "comments" ON "comments"."article_id" = "articles"."id"
GROUP BY comments.article_id
(PSQL): PG::GroupingError: ERROR: column "articles.id"
must appear in the GROUP BY clause or be used in
an aggregate function
And there doesn't appear to be a .from method in which I could specify the nested SQL SELECT.
Actually, there's a .from method:
scope :most_comments, -> {
Article.select('*, cs.count').from(
'articles, (
SELECT article_id, count(*)
FROM comments
GROUP BY comments.article_id
) cs'
)
.where('articles.id = cs.article_id')
.order('cs.count DESC')
}
Not sure if this is the best way but it works...

Distinct Records with joins and order

I have a simple relationship between User and Donations in that a user has many donations, and a donation belongs to a user. What I'd like to do is get a list of users, ordered by the most recent donations.
Here's what I'm trying:
First I want to get the total number of uniq users, which is working as expected:
> User.joins(:donations).order('donations.created_at').uniq.count
(3.2ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id"
=> 384
Next, when I remove the count method, I get an error that "ORDER BY expressions must appear in select list":
> User.joins(:donations).order('donations.created_at').uniq
User Load (0.9ms) SELECT DISTINCT "users".* FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id" ORDER BY donations.created_at
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...ON "donations"."user_id" = "users"."id" ORDER BY donations....
Then I tried fixing the Postgres error by explicitly setting the SELECT clause which at first glance appears to work:
> User.select('DISTINCT "users".id, "users".*, "donations".created_at').joins(:donations).order('donations.created_at')
User Load (17.6ms) SELECT DISTINCT "users".id, "users".*, "donations".created_at FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id" ORDER BY donations.created_at
However, the number of records returned does not take into account the DISTINCT statement and returns 692 records:
> _.size
=> 692
How do I get the expected number of results (384) while also sorting by the donation's created_at timestamp?
Try this:
User.select('users.*,MAX(donations.created_at) as most_recent_donation').
joins(:donations).order('most_recent_donation desc').group('users.id')
I suppose an user has many donations, this would select the most recent created donation and would select distinct users filtering by their id.
I have not tested this though.

Resources