ActiveRecord distinct doesn't work - ruby-on-rails

I made a Select using Active Record with a lot of Joins. This resulted in duplicate values. After the select function there's the distinct function with value :id. But that didn't work!
Here's the code:
def join_query
<<-SQL
LEFT JOIN orders on orders.purchase_id = purchases.id
LEFT JOIN products on products.id = orders.complete_product_id
SQL
end
def select_query
<<-SQL
purchases.*,
products.reference_code as products_reference_code
SQL
end
result = Purchase.joins(join_query)
.select(select_query)
.distinct(:id)
Of course, neither distinct! or uniq functions worked. The distinct! returned a error from "ActiveRecord::ImmutableRelation" that I don't know what means.
To fix this I did a hack, converting the ActiveRecord_Relation object to an Array and I used the uniq function of Ruby.
What's going on here?

try this out:
def select_query
<<-SQL
DISTINCT ON (purchases.id) purchases.id,
products.reference_code as products_reference_code
SQL
end
add more comma separated column names in select clause
Purchase.select(select_query).joins(join_query)

Related

Query on ruby on Rails

How do you query on Ruby on Rails or translate this query on Ruby on Rails?
SELECT
orders.item_total,
orders.total,
payments.created_at,
payments.updated_at
FROM
public.payments,
public.orders,
public.line_items,
public.variants
WHERE
payments.order_id = orders.id AND
orders.id = line_items.order_id AND
This is working on Postgres but I'm new to RoR and it's giving me difficulty on querying this sample.
So far this is what I have.
Order.joins(:payments,:line_items,:variants).where(payments:{order_id: [Order.ids]}, orders:{id:LineItem.orders_id}).distinct.pluck(:email, :id, "payments.created_at", "payments.updated_at")
I have a lot of reference before asking a question here are the links.
How to combine two conditions in a where clause?
Rails PG::UndefinedTable: ERROR: missing FROM-clause entry for table
Rails ActiveRecord: Pluck from multiple tables with same column name
ActiveRecord find and only return selected columns
https://guides.rubyonrails.org/v5.2/active_record_querying.html
from all that link I produced this code that works for testing.
Spree::Order.joins(:payments,:line_items,:variants).where(id: [Spree::Payment.ids]).distinct.pluck(:email, :id)
but when I try to have multiple queries and pluck a specific column name from a different table it gives me an error.
Update
So I'm using Ransack to query I produced this code.
#search = Spree::Order.ransack(
orders_gt: params[:q][:created_at_gt],
orders_lt: params[:q][:created_at_lt],
payments_order_id_in: [Spree::Order.ids],
payments_state_eq: 'completed',
orders_id_in: [Spree::LineItem.all.pluck(:order_id)],
variants_id_in: [Spree::LineItem.ids]
)
#payment_report = #search.result
.includes(:payments, :line_items, :variants)
.joins(:line_items, :payments, :variants).select('payments.response_code, orders.number, payments.number')
I don't have error when I remove the select part and I need to get that specific column. Is there a way?
You just have to make a join between the tables and then select the columns you want
Spree::Order.joins(:payments, :line_items).pluck("spree_orders.total, spree_orders.item_total, spree_payments.created_at, spree_payments.updated_at")
or
Spree::Order.joins(:payments, :line_items).select("spree_orders.total, spree_orders.item_total, spree_payments.created_at, spree_payments.updated_at")
That is equivalent to this query
SELECT spree_orders.total,
spree_orders.item_total,
spree_payments.created_at,
spree_payments.updated_at
FROM "spree_orders"
LEFT OUTER JOIN "spree_payments" ON "spree_payments"."order_id" = "spree_orders"."id"
LEFT OUTER JOIN "spree_line_items" ON "spree_line_items"."order_id" = "spree_orders"."id"
You can use select_all method.This method will return an instance of ActiveRecord::Result class and calling to_hash on this object would return you an array of hashes where each hash indicates a record.
Order.connection.select_all("SELECT
orders.item_total,
orders.total,
payments.created_at,
payments.updated_at
FROM
public.payments,
public.orders,
public.line_items,
public.variants
WHERE
payments.order_id = orders.id AND
orders.id = line_items.order_id").to_hash

Can you do a group by with find_each in rails?

I am trying to write a function that groups by some columns in a very large table (millions of rows). Is there any way to get find_each to work with this, or is it impossible given that I do not want to order by the id column?
The SQL of my query is:
SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
The rails find_each automatically adds the ORDER BY clause using a reorder statement. I have tried changing the SQL to:
SELECT MAX(id) AS "mytable"."id", derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
but that doesn't work either. Any ideas other than writing my own find_each function or overriding the private batch_order function in batches.rb?
There are at least two approaches to solve this problem:
I. Use subquery:
# query the table and select id, derivable_type and derivable_id
my_table_ids = MyTable
.group("derivable_type, derivable_id")
.select("MAX(id) AS my_table_id, derivable_type, derivable_id")
# use subquery to allow rails to use ORDER BY in find_each
MyTable
.where(id: my_table_ids.select('my_table_id'))
.find_each { |row| do_something(row) }
II. Write custom find_each function
rows = MyTable
.group("derivable_type, derivable_id")
.select("derivable_type, derivable_id")
find_each_grouped(rows, ['derivable_type', 'derivable_id']) do |row|
do_something(row)
end
def find_each_grouped(rows, columns, &block)
offset = 0
batch_size = 1_000
loop do
batch = rows
.order(columns)
.offset(offset)
.limit(limit)
batch.each(&block)
break if batch.size < limit
offset += limit
end
end
I'm not sure I'm 100% clear on what you're trying to do, but your query looks the same as doing an aggregate distinct()
SELECT derivable_type, derivable_id FROM "mytable" GROUP BY derivable_type, derivable_id ORDER BY "mytable"."id" ASC;
---- vv
SELECT DISTINCT(derivable_type, derivable_id) FROM "mytable" ORDER BY "mytable"."id" ASC;
You should be able to use Active Record to accomplish this, combined with find_each (if Mytable is your model):
Mytable.all.group(:derivable_type, :derivable_id).distinct.find_each
# gives => #<Enumerator: #<ActiveRecord::Relation [...]>:find_each({:start=>nil, :finish=>nil, :batch_size=>1000, :error_on_ignore=>nil})>

how to write a join condition in Ruby on Rails?

what I'm trying to do is to write something like the next query:
SELECT *
FROM Customers c
LEFT JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
Notice that I'm not using any WHERE clause, but I need to my JOIN have a condition. I cannot make it work in Ruby on Rails.
Can you help me out?
You can join the tables with LEFT JOIN. Just pass the join condition in joins and you will get the expected result
Customer.joins("LEFT JOIN CustomerAccounts
ON CustomerAccounts.CustomerID = Customers.CustomerID
AND Customers.State = 'NY'")
#=> SELECT * FROM Customers LEFT JOIN CustomerAccounts ON CustomerAccounts.CustomerID = Customers.CustomerID AND Customers.State = 'NY'
Note: just .joins() does INNER JOIN so you need to specify the join with condition
Your SQL code, translated to activerecord, would look as follows (using joins):
Customer.where(state: 'NY').joins(:customer_accounts)
The code assumes, you have the association set up:
class Customer
has_many :customer_accounts
end

Add computable column to multi-table select clause with eager_load in Ruby on Rails Activerecord

I have a query with a lot of joins and I'm eager_loading some of associations at the time. And I need to compute some value as attribute of one of models.
So, I'm trying this code:
ServiceObject
.joins([{service_days: :ou}, :address])
.eager_load(:address, :service_days)
.where(ous: {id: OU.where(sector_code: 5)})
.select('SDO_CONTAINS(ous.service_area_shape, SDO_GEOMETRY(2001, 8307, sdo_point_type(addresses.lat, addresses.lng, NULL), NULL, NULL) ) AS in_zone')
Where SQL function call in select operates data from associated addresses and ous tables.
I'm getting next SQL (so my in_zone column getting calculated and returned as first column before other columns for all eager_loaded models):
SELECT SDO_CONTAINS(ous.service_area_shape, SDO_GEOMETRY(2001, 8307, sdo_point_type(addresses.lat, addresses.lng, NULL), NULL, NULL) ) AS in_zone, "SERVICE_OBJECTS"."ID" AS t0_r0, "SERVICE_OBJECTS"."TYPE" AS t0_r1, <omitted for brevity> AS t2_r36 FROM "SERVICE_OBJECTS" INNER JOIN "SERVICE_DAYS" ON "SERVICE_DAYS"."SERVICE_OBJECT_ID" = "SERVICE_OBJECTS"."ID" INNER JOIN "OUS" ON "OUS"."ID" = "SERVICE_DAYS"."OU_ID" INNER JOIN "ADDRESSES" ON "ADDRESSES"."ID" = "SERVICE_OBJECTS"."ADDRESS_ID" WHERE "OUS"."ID" IN (SELECT "OUS"."ID" FROM "OUS" WHERE "OUS"."SECTOR_CODE" = :a1) [["sector_code", "5"]]
But it seems like that in_zone isn't accessible from either model used in query.
I need to have calculated in_zone as attribute of ServiceObject model object, how I can accomplish that?
Ruby on Rails 4.2.6, Ruby 2.3.0, oracle_enhanced adapter 1.6.7, Oracle 12.1
I have successfully replicated your issue and it turns out that this is a known issue in Rails. The problem is that when using eager_load, Rails maps the columns of all eager-loaded tables into table and column aliases in the form of t0_r0, t0_r1, etc... (you can see these in the SQL that you pasted in the question). And while doing that, it simply ignores the custom columns in the select, probably because it cannot determine which eager-loaded table it should attribute the custom column to. It is sad that this issue is open for more than 2 years now...
Nevertheless I think I found a workaround. It seems that if you don't eager load the tables but manually join them (with joins), you can as well include them (with includes) and the custom columns will be returned as there will be no column aliasing taking place. The point is that you must not use associations in the joins clauses but you have to specify the joins yourself. Also note that you must specify all columns from the main table in the select manually too (see the service_objects.* in the select).
Try the following approach:
ServiceObject
.joins('INNER JOIN "SERVICE_DAYS" ON "SERVICE_DAYS"."SERVICE_OBJECT_ID" = "SERVICE_OBJECTS"."ID"')
.joins('INNER JOIN "OUS" ON "OUS"."ID" = "SERVICE_DAYS"."OU_ID"')
.joins('INNER JOIN "ADDRESSES" ON "ADDRESSES"."ID" = "SERVICE_OBJECTS"."ADDRESS_ID"')
.includes(:service_days, :address)
.where(ous: {id: OU.where(sector_code: 5)})
.select('service_objects.*, SDO_CONTAINS(ous.service_area_shape, SDO_GEOMETRY(2001, 8307, sdo_point_type(addresses.lat, addresses.lng, NULL), NULL, NULL) ) AS in_zone')
The computation in the select should still work as the related tables are joined together but there should be no column aliasing present.
Of course this approach means that you'll get three queries instead of just one but unless you return a huge amount of records, the following two queries run by the includes clause should be very fast as they simply load the relevant records using foreign keys.
That monkey patch helped #Envek:
module ActiveRecord
Base.send :attr_accessor, :_row_
module Associations
class JoinDependency
JoinBase && class JoinPart
def instantiate_with_row(row, *args)
instantiate_without_row(row, *args).tap { |i| i._row_ = row }
end; alias_method_chain :instantiate, :row
end
end
end
end
then it is possible to do:
ServiceObject
.joins([{service_days: :ou}, :address])
.eager_load(:address, :service_days)
.where(ous: {id: OU.where(sector_code: 5)})
.select('SDO_CONTAINS(ous.service_area_shape, SDO_GEOMETRY(2001, 8307, sdo_point_type(addresses.lat, addresses.lng, NULL), NULL, NULL) ) AS in_zone')
.first
._row_['in_zone']

ActiveRecord change or reset ordering defined in scope

I have a function which uses another functions output: an ActiveRecord::Relation object. This relation already has an order clause:
# This function cannot be changed
def black_box
Product.where('...').order("name")
end
def my_func
black_box.order("id")
end
when I execute the relation the ORDER_BY clause is ordered by the order functions:
SELECT * FROM products
WHERE ...
ORDER_BY('name', 'id') // The first order function, then the second
Is there any way I can specify the relation to insert my order function BEFORE the previous one? So the SQL would look like so?
SELECT * FROM products
WHERE ...
ORDER_BY('id', 'name')
You could use reorder method to reset the original order and add your new order by column.
reorder(*args)
Replaces any existing order defined on the relation with the specified order.
User.order('email DESC').reorder('id ASC') # generated SQL has 'ORDER BY id ASC'
Subsequent calls to order on the same relation will be appended. For example:
User.order('email DESC').reorder('id ASC').order('name ASC')
# generates a query with 'ORDER BY id ASC, name ASC'.

Resources