Distinct Records with joins and order - ruby-on-rails

I have a simple relationship between User and Donations in that a user has many donations, and a donation belongs to a user. What I'd like to do is get a list of users, ordered by the most recent donations.
Here's what I'm trying:
First I want to get the total number of uniq users, which is working as expected:
> User.joins(:donations).order('donations.created_at').uniq.count
(3.2ms) SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id"
=> 384
Next, when I remove the count method, I get an error that "ORDER BY expressions must appear in select list":
> User.joins(:donations).order('donations.created_at').uniq
User Load (0.9ms) SELECT DISTINCT "users".* FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id" ORDER BY donations.created_at
PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...ON "donations"."user_id" = "users"."id" ORDER BY donations....
Then I tried fixing the Postgres error by explicitly setting the SELECT clause which at first glance appears to work:
> User.select('DISTINCT "users".id, "users".*, "donations".created_at').joins(:donations).order('donations.created_at')
User Load (17.6ms) SELECT DISTINCT "users".id, "users".*, "donations".created_at FROM "users" INNER JOIN "donations" ON "donations"."user_id" = "users"."id" ORDER BY donations.created_at
However, the number of records returned does not take into account the DISTINCT statement and returns 692 records:
> _.size
=> 692
How do I get the expected number of results (384) while also sorting by the donation's created_at timestamp?

Try this:
User.select('users.*,MAX(donations.created_at) as most_recent_donation').
joins(:donations).order('most_recent_donation desc').group('users.id')
I suppose an user has many donations, this would select the most recent created donation and would select distinct users filtering by their id.
I have not tested this though.

Related

Ruby on Rails - order not working with distinct.pluck

app/model/line_item.rb
class LineItem < ApplicationRecord
default_scope { order(:order_date, :line_item_index) }
scope :sorted, -> { order(:order_date, :line_item_index) }
scope :open_order_names, -> { distinct.pluck(:order_name) }
end
What I have tried:
LineItem.open_order_names # Way 1
LineItem.sorted.open_order_names # Way 2
LineItem.open_order_names.sorted # Way 3
But I am always getting this error.
ActiveRecord::StatementInvalid (PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ...ne_items"."order_name" FROM "line_items" ORDER BY "line_item...
^
):
Anyone can help me?
The issue is that you need to specify how they should be distinct, the following should work for you, the select may not be needed.
scope :open_order_names, -> { select(:order_name).distinct(:order_name).pluck(:order_name) }
So it's database restriction. For example we have users table with (id, email).
You can do:
SELECT DISTINCT "users"."email" FROM "users"
or
SELECT "users"."email" FROM "users" ORDER BY "users"."id" ASC
but can not:
SELECT DISTINCT "users"."email" FROM "users" ORDER BY "users"."id" ASC
i.e. you can not order by column which abcent in the SELECT part of query if you use the DISTINCT.
As mentioned above the
scope :open_order_names, -> { select(:order_name).distinct(:order_name).pluck(:order_name) }
could be nice solution.

Remove duplicated records keeping last usign ActiveRecord

I've been trying to remove the records that are duplicated (same value in the column shopify_order_id) keeping the most recent one.
I wrote it in sql:
select orders.id from (
select shopify_order_id, min(shopify_created_at) as min_created
from orders group by shopify_order_id having count(*) > 1 limit 5000
) as keep_orders
join orders
on
keep_orders.shopify_order_id = orders.shopify_order_id and
orders.shopify_created_at <> keep_orders.min_created
and now I'm trying to get it to Active Record but can't seem to join the two parts.
The first nested select is
Order.select('shopify_order_id, MIN(shopify_created_at) as min_created').
group(:shopify_order_id).
having('count(*) > 1').
limit(5000)
but then the following doesn't work:
Order.select('orders.id').from(keep_orders, :keep_orders).
joins('orders ON keep_orders.shopify_order_id = orders.shopify_order_id').
where.not('orders.shopify_created_at = keep_orders.min_created')
it builds the query:
SELECT orders.id FROM (SELECT shopify_order_id, MIN(shopify_created_at) as min_created FROM "orders" GROUP BY "orders"."shopify_order_id" HAVING (count(*) > 1) LIMIT $1) keep_orders orders ON keep_orders.shopify_order_id = orders.shopify_order_id WHERE NOT (orders.shopify_created_at = keep_orders.min_created) ORDER BY "orders"."id" ASC LIMIT $2 [["LIMIT", 5000], ["LIMIT", 1]]
which is missing the keyword join.
Any help on how to refactor the query/do it in another way would be more than appreciated.
If you call joins with a string SQL fragment you need to specify the type of join you want:
Order.select('orders.id').from(keep_orders, :keep_orders)
.joins('JOIN orders ON keep_orders.shopify_order_id = orders.shopify_order_id')
.where.not('orders.shopify_created_at = keep_orders.min_created')

What is default order when there is no explicit defined order?

In Rails ActiveRecord when I do something like that event_instances.order(:created_at) and not specifying any order which order is default DESC or ASC ?
Thanks in advance.
According the user manuals for rails, when you've specified a symbol, sorting is setup to ASC, when string, default order, which is set by database up, is specified:
User.order(:name)
=> SELECT "users".* FROM "users" ORDER BY "users"."name" ASC
User.order('name')
=> SELECT "users".* FROM "users" ORDER BY name
Sort orders in DBs:
For Postgres:
ASC order is the default.
For MySQL 5.7:
The default is ascending order; this can be specified explicitly using the ASC keyword.
For SQLite:
If neither ASC or DESC are specified, rows are sorted in ascending (smaller values first) order by default.
So for all the main DBs dafault order is ASC

Combine a join query with a normal condition

Users have a main category and multiple sub-categories
I would like to get all users who belong to a category, regardless if it is their main or sub.
#users = User.joins(:sub_categories).where('sub_category_id = ? OR type = ?', #sub_category.id, "User::#{category}User").page(params[:page])
A user's main category is also their STI type.
The query works, however I am getting duplicate results when trying to include the user's main type.
The query that is generated is:
User Load (0.0ms) SELECT "users".* FROM "users" INNER JOIN "user_sub_categories_users" ON "user_sub_categories_users"."user_id" = "users"."id" INNER JOIN "user_sub_categories" ON "user_sub_categories"."id" = "user_sub_categories_users"."sub_category_id" WHERE "users"."deleted_at" IS NULL AND (sub_category_id = 1 OR type = 'User::ModelUser') ORDER BY "users"."created_at" DESC LIMIT 20 OFFSET 0
EDIT: A user can not belong to a sub-category if its their main, so its safe to simply join the two conditions together
Because there is more than one group for each user, your join is creating multiple rows for each user, e.g.:
User Group
-------|------
UserA |GroupA
UserA |GroupB
UserB |GroupA
UserC |GroupA
UserC |GroupB
Three users, but five rows!
You can safely add a .uniq on the end of your query if you're just interested in the distinct users. In the context of an ActiveRecord query, .uniq will be converted to SQL's DISTINCT().

How to reduce database queries and is it Worth it?

I currently have an app that lists flights from one location to another, its price and other information. I have implemented a search through a drop down list so it only shows flights either from a certain location, to a certain location or from and to a certain location, depending on how the user searches.
def index
#flights = Flight.all
#flights_source = #flights.select('DISTINCT source') #this line is used for options_from_collection_for_select in view
#flights_destination = #flights.select('DISTINCT destination') #this line is used for options_from_collection_for_select in view
if params[:leaving_from].present? && params[:going_to].blank?
#flights = Flight.where(:source => params[:leaving_from])
elsif params[:going_to].present? && params[:leaving_from].blank?
#flights = Flight.where(:destination => params[:going_to])
elsif params[:leaving_from].present? && params[:going_to].present?
#flights = Flight.where(:source => params[:leaving_from]).where(:destination => params[:going_to])
end
end
The problem is every time I want to add another search parameter, for example price, it's going to be another query. Is there a way to take Flight.all and search within the result and make a new hash or array with only the records that match the search terms, instead of doing a new query with select DISTINCT.
The closest thing I could come up with is somehow turning the result of Flight.all into a array[hash] and using that get the results for distinct source and destination. But not sure how to do that.
And finally would it be worth it to do this to reduce the number of database queries?
These are the current queries:
Flight Load (1.4ms) SELECT "flights".* FROM "flights"
User Load (1.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 2]]
Flight Load (1.4ms) SELECT DISTINCT source FROM "flights"
Flight Load (0.8ms) SELECT DISTINCT destination FROM "flights"
EDIT:
I changed the select distinct to
#flights_source = #flights.uniq.pluck(:source)
#flights_destination = #flights.uniq.pluck(:destination)
And used options_for_select instead of options_from_collection_for_select in the view. But the queries are still, I think this means I eliminated us much as I can, not sure though.
(0.8ms) SELECT DISTINCT "flights"."source" FROM "flights"
(0.6ms) SELECT DISTINCT "flights"."destination" FROM "flights"
Request Load (1.3ms) SELECT "requests".* FROM "requests"
Flight Load (1.0ms) SELECT "flights".* FROM "flights"
User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT 1 [["id", 2]]

Resources