I'm building a report in a Ruby on Rails application and I'm struggling to understand how to use a subquery.
Each 'Survey' has_many 'SurveyResponses' and it is simple enough to retrieve these however I need to group them according to one of the fields, 'jobcode', as I only want to report the information relating to a single jobcode in one line in the report.
However I also need to know the constituent data that makes up the totals for that jobcode. The reason for this is that I need to calculate data such as medians and standard deviations and so need to know the values that make the total.
My thinking is that I retrieve the distinct jobcodes that were reported on for the survey and then as I loop through these I retrieve the individual responses for each jobcode.
Is this the correct way to do this or should I follow a different method?
You could use a named scope to simplify getting the groups of responses:
named_scope :job_group, lambda{|job_code| {:conditions => ["job_code = ?", job_code]}}
Put that in your response model, aand use it like this:
job.responses.job_group('some job code')
and you'll get an array of responses. If you're looking to get the mean of the values of one of the attributes on the responses, you can use map:
r = job.responses.job_group('some job code')
r.map(&:total)
=> [1, 5, 3, 8]
Alternatively, you might find it quicker to write custom SQL in order to get the mean / average / sum of groups of attributes. Going through rails for this sort of work may cause significant lag.
ActiveRecord::Base.connection.execute("Custom SQL here")
You can also use Model.find_by_sql()
For example:
class User < Activerecord::Base
# Your usual AR model
end
...
def index
#users = User.find_by_sql "select * from users"
# etc
end
Related
Assuming this simplified schema:
users has_many discount_codes
discount_codes has_many orders
I want to grab all users, and if they happen to have any orders, only include the orders that were created between two dates. But if they don't have orders, or have orders only outside of those two dates, still return the users and do not exclude any users ever.
What I'm doing now:
users = User.all.includes(discount_codes: :orders)
users = users.where("orders.created_at BETWEEN ? AND ?", date1, date2).
or(users.where(orders: { id: nil })
I believe my OR clause allows me to retain users who do not have any orders whatsoever, but what happens is if I have a user who only has orders outside of date1 and date2, then my query will exclude that user.
For what it's worth, I want to use this orders where clause here specifically so I can avoid n + 1 issues later in determining orders per user.
Thanks in advance!
It doesn't make sense to try and control the orders that are loaded as part of the where clause for users. If you were to control that it'd have to be part of the includes (which I think means it'd have to be a part of the association).
Although technically it can combine them into a single query in some cases, activerecord is going to do this as two queries.
The first query will be executed when you go to iterate over the users and will use that where clause to limit the users found.
It will then run a second query behind the scenes based on that includes statement. This will simply be a query to get all orders which are associated with the users that were found by the previous query. As such the only way to control the orders that are found through the user's where clause is to omit users from the result set.
If I were you I would create an instance method in User model for what you are looking for but instead of using where use a select block:
def orders_in_timespan(start, end)
orders.select{ |o| o.between?(start, end) }
end
Because of the way ActiveRecord will cache the found orders from the includes against the instance then if you start off with an includes in your users query then I believe this will not result in n queries.
Something like:
render json: User.includes(:orders), methods: :orders_in_timespan
Of course, the easiest way to confirm the number of queries is to look at the logs. I believe this approach should have two queries regardless of the number of users being rendered (as likely does your code in the question).
Also, I'm not sure how familiar you are with sql but you can call .to_sql on the end of things such as your users variable in order to see the sql that would be generated which might help shed some light on the discrepancies between what you're getting and what you're looking for.
Option 1: Write a custom query in SQL (ugly).
Option 2: Create 2 separate queries like below...
#users = User.limit(10)
#orders = Order.joins(:discount_code)
.where(created_at: [10.days.ago..1.day.ago], discount_codes: {user_id: users.select(:id)})
.group_by{|order| order.discount_code.user_id}
Now you can use it like this ...
#users.each do |user|
orders = #orders[user.id]
puts user.name
puts user.id
puts orders.count
end
I hope this will solve your problem.
You need to use joins instead of includes. Rails joins use inner joins and will reject all the records which don't have associations.
User.joins(discount_codes: :orders).where(orders: {created_at: [10.days.ago..1.day.ago]}).distinct
This will give you all distinct users who placed orders in a given period of time.
user = User.joins(:discount_codes).joins(:orders).where("orders.created_at BETWEEN ? AND ?", date1, date2) +
User.left_joins(:discount_codes).left_joins(:orders).group("users.id").having("count(orders.id) = 0")
In order to learn Ruby on Rails I am writing a web app that will be used to sort teams within a tournament given their performance to date.
The complication is that I want each tournament organiser (system user) to be able to use a variety of metrics in an arbitrary order.
Expressed as SQL (my background) I want User 1 to be able to choose:
ORDER BY
METRIC1
,METRIC2
,METRIC3
Whilst User 2 could choose:
ORDER BY
METRIC2
,METRIC3
,METRIC1
How would I accept this user input and use it to create a query on the Team table?
Edit 1 Neglected to mention (sorry) that the metrics themselves are calculated on the fly. Currently they are instance methods (e.g #team.metric1 etc). The abortive attempts I have made so far all involve trying to convert user strings to method names which just seems wrong (and I haven't been able to get it to work).
Edit 2 some example code in teams_controller.rb:
class Team < ApplicationRecord
belongs_to :tournament
has_many :matches
def score_for
matches.sum(:score_for)
end
def score_diff
matches.sum(:score_for) - matches.sum(:score_against)
end
end
ActiveRecord allows multiple arguments to be passed to the order method. So you could do something like:
Team.order(:metric2, :metric3, metric1: :desc)
Another options is you can also use ActiveRecord to dynamically construct a query. ActiveRecord queries are lazily evaluated, so the SQL won't be executed until you call an operation that requires loading the records.
For example you could construct a scope on Team like this:
class Team < ApplicationRecord
scope :custom_order, lambda { |sorting_order|
sorting_order.each do |metric|
order(metric)
end
}
end
You would then just need to input a collection of attributes in the order you wanted the order by clauses to be executed. For example:
Team.custom_order([:metric2, :metric3, :metric1])
A working but probably awful solution:
class Tournament < ApplicationRecord
has_many :teams
serialize :tiebreaker, Array
TIEBREAKER_WHITELIST = %w[score opponent_score possession].freeze
def sorted_teams
list = teams.shuffle
(TIEBREAKER_WHITELIST & tiebreaker).reverse.each do |metric|
list = list.sort_by { |team| [team.send(metric), list.find_index(team)] }
end
list.reverse
end
end
Each tournament has many teams. A tournament instance has a serialized field called tiebreaker. This contains an array of strings something like ["score", "possession"] where each string matches the name of a public instance method on team. Each of these methods returns a number.
The tiebreaker field is in descending order of precedence, so for the above example I would only expect possession to affect sorting for teams with an equal score.
list = teams.shuffle - this randomises the list to start with, in case teams are tied for all of the following tiebreakers.
(TIEBREAKER_WHITELIST & tiebreaker) - this returns only strings that appear in both the tiebreaker field and the whitelist constant to protect against end users running arbitrary methods.
.reverse.each do |metric| - this reverses the array of metrics so that the list is sorted by the lowest precedence metric first.
[team.send(metric), list.find_index(team)] - this is the sort for each metric. send turns the string into a method call. I found find_indexwas necessary to preserver sort order from previous sorts. i.e. if I had first sorted for possession this would preserve the order for teams with the same score.
list.reverse - reverse the list then return it. This was because I wanted higher scoring/possession teams first on my list and sort_by sorts ascending.
I wanted some metrics sorted ascending (opponent_score) and others descending (score) so I handled this in the respective methods, returning negative values for opponent_score for example.
I'm not entirely happy with the solution as is but it does seem to work!
for a data analysis i need both results into one set.
a.follower_trackings.pluck(:date, :new_followers, :deleted_followers)
a.data_trackings.pluck(:date, :followed_by_count)
instead of ugly-merging an array (they can have different starting dates and i obv. need only those values where the date exists in both arrays) i thought about mysql
SELECT
followers.new_followers,
followers.deleted_followers,
trackings.date,
trackings.followed_by_count
FROM
instagram_user_follower_trackings AS followers,
instagram_data_trackings AS trackings
WHERE
followers.date = trackings.date
AND
followers.user_id=5
AND
trackings.user_id=5
ORDER
BY trackings.date DESC
This is Working fine, but i wonder if i can write the same with ActiveRecord?
You can do the following which should render the same query as your raw SQL, but it's also quite ugly...:
a.follower_trackings.
merge(a.data_trackings).
from("instagram_user_follower_trackings, instagram_data_trackings").
where("instagram_user_follower_trackings.date = instagram_data_trackings.date").
order(:date => :desc).
pluck("instagram_data_trackings.date",
:new_followers, :deleted_followers, :followed_by_count)
There are a few tricks turned out useful while playing with the scopes: the merge trick adds the data_trackings.user_id = a.id condition but it does not join in the data_trackings, that's why the from clause has to be added, which essentially performs the INNER JOIN. The rest is pretty straightforward and leverages the fact that order and pluck clauses do not need the table name to be specified if the columns are either unique among the tables, or are specified in the SELECT (pluck).
Well, when looking again, I would probably rather define a scope for retrieving the data for a given user (a record) that would essentially use the raw SQL you have in your question. I might also define a helper instance method that would call the scope with self, something like:
def Model
scope :tracking_info, ->(user) { ... }
def tracking_info
Model.tracking_info(self)
end
end
Then one can use simply:
a = Model.find(1)
a.tracking_info
# => [[...], [...]]
I have following method in a model named CashTransaction.
def is_refundable?
self.amount > self.total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
Now I need to extract all the records which satisfy the above function i.e records which return true.
I got that working by using following statement:
CashTransaction.all.map { |x| x if x.is_refundable? }
But the result is an Array. I am looking for ActiveRecord_Relation object as I need to perform join on the result.
I feel I am missing something here as it doesn't look that difficult. Anyways, it got me stuck. Constructive suggestions would be great.
Note: Just amount is a CashTransaction column.
EDIT
Following SQL does the job. If I can change that to ORM, it will still do the job.
SELECT `cash_transactions`.* FROM `cash_transactions` INNER JOIN `refunds` ON `refunds`.`cash_transaction_id` = `cash_transactions`.`id` WHERE (cash_transactions.amount > (SELECT SUM(`amount`) FROM `refunds` WHERE refunds.cash_transaction_id = cash_transactions.id GROUP BY `cash_transaction_id`));
Sharing Progress
I managed to get it work by following ORM:
CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
But what I was actually looking was something like:
CashTransaction.joins(:refunds).where(is_refundable? : true)
where is_refundable? being a model function. Initially I thought setting is_refundable? as attr_accesor would work. But I was wrong.
Just a thought, can the problem be fixed in an elegant way using Arel.
There are two options.
1) Finish, what you have started (which is extremely inefficient when it comes to bigger amount of data, since it all is taken into the memory before processing):
CashTransaction.all.map(&:is_refundable?) # is the same to what you've written, but shorter.
SO get the ids:
ids = CashTransaction.all.map(&:is_refundable?).map(&:id)
ANd now, to get ActiveRecord Relation:
CashTransaction.where(id: ids) # will return a relation
2) Move the calculation to SQL:
CashTransaction.where('amount > total_refunded_amount')
Second option is in every possible way faster and efficient.
When you deal with database, try to process it on the database level, with smallest Ruby involvement possible.
EDIT
According to edited question here is how you would achieve the desired result:
CashTransaction.joins(:refunds).where('amount > SUM(refunds.amount)')
EDIT #2
As to your updates in question - I don't really understand, why you have latched onto is_refundable? as an instance method, which could be used in query, which is basically not possible in AR, but..
My suggestion is to create a scope is_refundable:
scope :is_refundable, -> { CashTransaction
.joins(:refunds)
.group('cash_transactions.id')
.having('cash_transactions.amount > sum(refunds.amount)')
}
Now it is available in as short notation as
CashTransaction.is_refundable
which is shorter and more clear than aimed
CashTransaction.where('is_refundable = ?', true)
You can do it this way:
cash_transactions = CashTransaction.all.map { |x| x if x.is_refundable? } # Array
CashTransaction.where(id: cash_transactions.map(&:id)) # ActiveRecord_Relation
But, this is an in-efficient way of doing it as the other answerers also mentioned.
You can do it using SQL if amount and total_refunded_amount are the columns of the cash_transactions table in the database which will be much more efficient and performant:
CashTransaction.where('amount > total_refunded_amount')
But, if amount or total_refunded_amount are not the actual columns in the database, then you can't do it this way. Then, I guess you have do it the other way which is in-efficient than using raw SQL.
I think you should pre-compute is_refundable result (in a new column) when a CashTransaction and his refunds (supposed has_many ?) are updated by using callbacks :
class CashTransaction
before_save :update_is_refundable
def update_is_refundable
is_refundable = amount > total_refunded_amount
end
def total_refunded_amount
self.refunds.sum(:amount)
end
end
class Refund
belongs_to :cash_transaction
after_save :update_cash_transaction_is_refundable
def update_cash_transaction_is_refundable
cash_transaction.update_is_refundable
cash_transaction.save!
end
end
Note : The above code must certainly be optimized to prevent some queries
They you can query is_refundable column :
CashTransaction.where(is_refundable: true)
I think it's not bad to do this on two queries instead of a join table, something like this
def refundable
where('amount < ?', total_refunded_amount)
end
This will do a single sum query then use the sum in the second query, when the tables grow larger you might find that this is faster than doing a join in the database.
i would like to have your opinion in a project i am currently working on.
class Product
has_many :orders
end
class Order
attr_accessor :deliverable # to contain temporary data on how many items can be delivered for this order
belongs_to :product
end
somehow i want to have
Order.all_deliverable
that will calculate the Product's quantity, subtract from list of Orders until the Product is empty or there is no more Order for this Product
to illustrate
Product A, quantity: 20
Product B, quantity: 0
Order 1, require Product A, quantity: 12
Order 2, require Product B, quantity: 10
Order 3, require Product A, quantity: 100
so if i call Order.all_deliverable, it will give
Order 1, deliverable:12
Order 3, deliverable: 8 #(20-12)
i have been thinking on using named_scope, but i think the logic will be too complex to be put in a named_scope. Any suggestion?
the pseudo code for all_deliverable will be something like this:
go to each orders
find the remaining quantity for specific product
deduct the product to max amount of order, if product is not enough, add the maximum product
add to the order
end
From what i read around in the web, named_scope deal mostly like find and have not many method calling and looping.
I would use a class method. Named scopes are good for adding to the options list you normally pass to find. You should make them as simple as possible, so that callers can chain them together in a way that makes sense in a particular context, and that allow the scopes to be reused.
Design aside, I'm not sure this can work as a named scope anyway:
Scopes return proxies that delay loading from the database until you access them. I'm not sure how you'd do that when you're computing the records to return.
I'm not sure you can set non-column attributes from within a scope.
Even if the above two items don't apply, the delayed load of scopes means you build it now, but potentially don't load the data until some later time, when it could be stale.
If you just want to manipulate things in a named scope, you can do it like this:
named_scope :foobar, lambda {
# do anything here.
# return hash with options for the named scope
{
:order => whatever,
:limit => 50
}
}
Be aware that Rails 3 deprecates long-used parts of activerecord.