I have following method in a model named CashTransaction.
def is_refundable?
self.amount > self.total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
Now I need to extract all the records which satisfy the above function i.e records which return true.
I got that working by using following statement:
CashTransaction.all.map { |x| x if x.is_refundable? }
But the result is an Array. I am looking for ActiveRecord_Relation object as I need to perform join on the result.
I feel I am missing something here as it doesn't look that difficult. Anyways, it got me stuck. Constructive suggestions would be great.
Note: Just amount is a CashTransaction column.
EDIT
Following SQL does the job. If I can change that to ORM, it will still do the job.
SELECT `cash_transactions`.* FROM `cash_transactions` INNER JOIN `refunds` ON `refunds`.`cash_transaction_id` = `cash_transactions`.`id` WHERE (cash_transactions.amount > (SELECT SUM(`amount`) FROM `refunds` WHERE refunds.cash_transaction_id = cash_transactions.id GROUP BY `cash_transaction_id`));
Sharing Progress
I managed to get it work by following ORM:
CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
But what I was actually looking was something like:
CashTransaction.joins(:refunds).where(is_refundable? : true)
where is_refundable? being a model function. Initially I thought setting is_refundable? as attr_accesor would work. But I was wrong.
Just a thought, can the problem be fixed in an elegant way using Arel.
There are two options.
1) Finish, what you have started (which is extremely inefficient when it comes to bigger amount of data, since it all is taken into the memory before processing):
CashTransaction.all.map(&:is_refundable?) # is the same to what you've written, but shorter.
SO get the ids:
ids = CashTransaction.all.map(&:is_refundable?).map(&:id)
ANd now, to get ActiveRecord Relation:
CashTransaction.where(id: ids) # will return a relation
2) Move the calculation to SQL:
CashTransaction.where('amount > total_refunded_amount')
Second option is in every possible way faster and efficient.
When you deal with database, try to process it on the database level, with smallest Ruby involvement possible.
EDIT
According to edited question here is how you would achieve the desired result:
CashTransaction.joins(:refunds).where('amount > SUM(refunds.amount)')
EDIT #2
As to your updates in question - I don't really understand, why you have latched onto is_refundable? as an instance method, which could be used in query, which is basically not possible in AR, but..
My suggestion is to create a scope is_refundable:
scope :is_refundable, -> { CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
}
Now it is available in as short notation as
CashTransaction.is_refundable
which is shorter and more clear than aimed
CashTransaction.where('is_refundable = ?', true)
You can do it this way:
cash_transactions = CashTransaction.all.map { |x| x if x.is_refundable? } # Array
CashTransaction.where(id: cash_transactions.map(&:id)) # ActiveRecord_Relation
But, this is an in-efficient way of doing it as the other answerers also mentioned.
You can do it using SQL if amount and total_refunded_amount are the columns of the cash_transactions table in the database which will be much more efficient and performant:
CashTransaction.where('amount > total_refunded_amount')
But, if amount or total_refunded_amount are not the actual columns in the database, then you can't do it this way. Then, I guess you have do it the other way which is in-efficient than using raw SQL.
I think you should pre-compute is_refundable result (in a new column) when a CashTransaction and his refunds (supposed has_many ?) are updated by using callbacks :
class CashTransaction
before_save :update_is_refundable
def update_is_refundable
is_refundable = amount > total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
end
class Refund
belongs_to :cash_transaction
after_save :update_cash_transaction_is_refundable
def update_cash_transaction_is_refundable
cash_transaction.update_is_refundable
cash_transaction.save!
end
end
Note : The above code must certainly be optimized to prevent some queries
They you can query is_refundable column :
CashTransaction.where(is_refundable: true)
I think it's not bad to do this on two queries instead of a join table, something like this
def refundable
where('amount < ?', total_refunded_amount)
end
This will do a single sum query then use the sum in the second query, when the tables grow larger you might find that this is faster than doing a join in the database.
Related
Assuming this simplified schema:
users has_many discount_codes
discount_codes has_many orders
I want to grab all users, and if they happen to have any orders, only include the orders that were created between two dates. But if they don't have orders, or have orders only outside of those two dates, still return the users and do not exclude any users ever.
What I'm doing now:
users = User.all.includes(discount_codes: :orders)
users = users.where("orders.created_at BETWEEN ? AND ?", date1, date2).
or(users.where(orders: { id: nil })
I believe my OR clause allows me to retain users who do not have any orders whatsoever, but what happens is if I have a user who only has orders outside of date1 and date2, then my query will exclude that user.
For what it's worth, I want to use this orders where clause here specifically so I can avoid n + 1 issues later in determining orders per user.
Thanks in advance!
It doesn't make sense to try and control the orders that are loaded as part of the where clause for users. If you were to control that it'd have to be part of the includes (which I think means it'd have to be a part of the association).
Although technically it can combine them into a single query in some cases, activerecord is going to do this as two queries.
The first query will be executed when you go to iterate over the users and will use that where clause to limit the users found.
It will then run a second query behind the scenes based on that includes statement. This will simply be a query to get all orders which are associated with the users that were found by the previous query. As such the only way to control the orders that are found through the user's where clause is to omit users from the result set.
If I were you I would create an instance method in User model for what you are looking for but instead of using where use a select block:
def orders_in_timespan(start, end)
orders.select{ |o| o.between?(start, end) }
end
Because of the way ActiveRecord will cache the found orders from the includes against the instance then if you start off with an includes in your users query then I believe this will not result in n queries.
Something like:
render json: User.includes(:orders), methods: :orders_in_timespan
Of course, the easiest way to confirm the number of queries is to look at the logs. I believe this approach should have two queries regardless of the number of users being rendered (as likely does your code in the question).
Also, I'm not sure how familiar you are with sql but you can call .to_sql on the end of things such as your users variable in order to see the sql that would be generated which might help shed some light on the discrepancies between what you're getting and what you're looking for.
Option 1: Write a custom query in SQL (ugly).
Option 2: Create 2 separate queries like below...
#users = User.limit(10)
#orders = Order.joins(:discount_code)
.where(created_at: [10.days.ago..1.day.ago], discount_codes: {user_id: users.select(:id)})
.group_by{|order| order.discount_code.user_id}
Now you can use it like this ...
#users.each do |user|
orders = #orders[user.id]
puts user.name
puts user.id
puts orders.count
end
I hope this will solve your problem.
You need to use joins instead of includes. Rails joins use inner joins and will reject all the records which don't have associations.
User.joins(discount_codes: :orders).where(orders: {created_at: [10.days.ago..1.day.ago]}).distinct
This will give you all distinct users who placed orders in a given period of time.
user = User.joins(:discount_codes).joins(:orders).where("orders.created_at BETWEEN ? AND ?", date1, date2) +
User.left_joins(:discount_codes).left_joins(:orders).group("users.id").having("count(orders.id) = 0")
I would like to order a collection first by priority and then due time like this:
#ods = Od.order(:priority, :due_date_time)
The problem is due_date_time is an instance method of Od, so I get
PG::UndefinedColumn: ERROR: column ods.due_date_time does not exist
I have tried the following, but it seems that by sorting and mapping ids, then finding them again with .where means the sort order is lost.
#ods = Od.where(id: (Od.all.sort {|a,b| a.due_date_time <=> b.due_date_time}.map(&:id))).order(:priority)
due_date_time calls a method from a child association:
def due_date_time
run.cut_off_time
end
run.cut_off_time is defined here:
def cut_off_time
(leave_date.beginning_of_day + route.cut_off_time_mins_since_midnight * 60)
end
I'm sure there is an easier way. Any help much appreciated! Thanks.
order from ActiveRecord similar to sort from ruby. So, Od.all.sort run iteration after the database query Od.all, run a new iteration map and then send a new database query. Also Od.all.sort has no sense because where select record when id included in ids but not searching a record for each id.
Easier do something like this:
Od.all.sort_by { |od| [od.priority, od.due_date_time] }
But that is a slow solution(ods table include 10k+ records). Prefer to save column to sort to the database. When that is not possible set logic to calculate due_date_time in a database query.
I have Order model in which I have datetime column start and int columns arriving_dur, drop_off_dur, etc.. which are durations in seconds from start
Then in my model I have
class Order < ApplicationRecord
def finish_time
self.start + self.arriving_duration + self.drop_off_duration
end
# other def something_time ... end
end
I want to be able to do this:
Order.where(finish_time: Time.now..(Time.now+2.hours) )
But of course I can't, because there's no such column finish_time. How can I achieve such result?
I've read 4 possible solutions on SA:
eager load all orders and select it with filter - that would not work well if there were more orders
have parametrized scope for each time I need but that means soo much code duplication
have sql function for each time and bind it to model with select() - it's just pain
somehow use http://api.rubyonrails.org/classes/ActiveRecord/Attributes/ClassMethods.html#method-i-attribute ? But I have no idea how to use it for my case or whether it even solves the problem I have.
Do you have any idea or some 'best practice' how to solve this?
Thanks!
You have different options to implement this behaviour.
Add an additional finish_time column and update it whenever you update/create your time values. This could be done in rails (with either before_validation or after_save callbacks) or as psql triggers.
class Order < ApplicationRecord
before_validation :update_finish_time
private
def update_finish_time
self.finish_time = start_time + arriving_duration.seconds + drop_off_duration.seconds
end
end
This is especially useful when you need finish_time in many places throughout your app. It has the downside that you need to manage that column with extra code and it stores data you actually already have. The upside is that you can easily create an index on that column should you ever have many orders and need to search on it.
An option could be to implement the finish-time update as a postgresql trigger instead of in rails. This has the benefit of being independent from your rails application (e.g. when other sources/scripts access your db too) but has the downside of splitting your business logic into many places (ruby code, postgres code).
Your second option is adding a virtual column just for your query.
def orders_within_the_next_2_hours
finishing_orders = Order.select("*, (start_time + (arriving_duration + drop_off_duration) * interval '1 second') AS finish_time")
Order.from("(#{finishing_orders.to_sql}) AS orders").where(finish_time: Time.now..(Time.now+2.hours) )
end
The code above creates the SQL query for finishing_order which is the order table with the additional finish_time column. In the second line we use that finishing_orders SQL as the FROM clause ("cleverly" aliased to orders so rails is happy). This way we can query finish_time as if it was a normal column.
The SQL is written for relatively old postgresql versions (I guess it works for 9.3+). If you use make_interval instead of multiplying with interval '1 second' the SQL might be a little more readable (but needs newer postgresql version, 9.4+ I think).
for a data analysis i need both results into one set.
a.follower_trackings.pluck(:date, :new_followers, :deleted_followers)
a.data_trackings.pluck(:date, :followed_by_count)
instead of ugly-merging an array (they can have different starting dates and i obv. need only those values where the date exists in both arrays) i thought about mysql
SELECT
followers.new_followers,
followers.deleted_followers,
trackings.date,
trackings.followed_by_count
FROM
instagram_user_follower_trackings AS followers,
instagram_data_trackings AS trackings
WHERE
followers.date = trackings.date
AND
followers.user_id=5
AND
trackings.user_id=5
ORDER
BY trackings.date DESC
This is Working fine, but i wonder if i can write the same with ActiveRecord?
You can do the following which should render the same query as your raw SQL, but it's also quite ugly...:
a.follower_trackings.
merge(a.data_trackings).
from("instagram_user_follower_trackings, instagram_data_trackings").
where("instagram_user_follower_trackings.date = instagram_data_trackings.date").
order(:date => :desc).
pluck("instagram_data_trackings.date",
:new_followers, :deleted_followers, :followed_by_count)
There are a few tricks turned out useful while playing with the scopes: the merge trick adds the data_trackings.user_id = a.id condition but it does not join in the data_trackings, that's why the from clause has to be added, which essentially performs the INNER JOIN. The rest is pretty straightforward and leverages the fact that order and pluck clauses do not need the table name to be specified if the columns are either unique among the tables, or are specified in the SELECT (pluck).
Well, when looking again, I would probably rather define a scope for retrieving the data for a given user (a record) that would essentially use the raw SQL you have in your question. I might also define a helper instance method that would call the scope with self, something like:
def Model
scope :tracking_info, ->(user) { ... }
def tracking_info
Model.tracking_info(self)
end
end
Then one can use simply:
a = Model.find(1)
a.tracking_info
# => [[...], [...]]
Lets say I have a model:
class Result < ActiveRecord::Base
attr_accessible :x, :y, :sum
end
Instead of doing
Result.all.find_each do |s|
s.sum = compute_sum(s.x, s.y)
s.save
end
assuming compute_sum is a available method and does some computation that cannot be translated into SQL.
def compute_sum(x,y)
sum_table[x][y]
end
Is there a way to use update_all, probably something like:
Result.all.update_all(sum: compute_sum(:x, :y))
I have more than 80,000 records to update. Each record in find_each creates its own BEGIN and COMMIT queries, and each record is updated individually.
Or is there any other faster way to do this?
If the compute_sum function can't be translated into sql, then you cannot do update_all on all records at once. You will need to iterate over the individual instances. However, you could speed it up if there are a lot of repeated sets of values in the columns, by only doing the calculation once per set of inputs, and then doing one mass-update per calculation. eg
Result.all.group_by{|result| [result.x, result.y]}.each do |inputs, results|
sum = compute_sum(*inputs)
Result.update_all('sum = #{sum}', "id in (#{results.map(&:id).join(',')})")
end
You can replace result.x, result.y with the actual inputs to the compute_sum function.
EDIT - forgot to put the square brackets around result.x, result.y in the group_by block.
update_all makes an sql query, so any processing you do on the values needs to be in sql. So, you'll need to find the sql function, in whichever DBMS you're using, to add two numbers together. In Postgres, for example, i believe you would do
Sum.update_all(sum: "x + y")
which will generate this sql:
update sums set sum = x + y;
which will calculate the x + y value for each row, and set the sum field to the result.
EDIT - for MariaDB. I've never used this, but a quick google suggests that the sql would be
update sums set sum = sum(x + y);
Try this first, in your sql console, for a single record. If it works, then you can do
Sum.update_all(sum: "sum(x + y)")
in Rails.
EDIT2: there's a lot of things called sum here which is making the example quite confusing. Here's a more generic example.
set col_c to the result of adding col_a and col_b together, in class Foo:
Foo.update_all(col_c: "sum(col_a + col_b)")
I just noticed that i'd copied the (incorrect) Sum.all.update_all from your question. It should just be Sum.update_all - i've updated my answer.
I'm completely beginner, just wondering Why not add a self block like below, without adding separate column in db, you still can access Sum.sum from outside.
def self.sum
x+y
end