How to get weighted average grouped by a column - ruby-on-rails

I have a model Company that have columns pbr, market_cap and category.
To get averages of pbr grouped by category, I can use group method.
Company.group(:category).average(:pbr)
But there is no method for weighted average.
To get weighted averages I need to run this SQL code.
select case when sum(market_cap) = 0 then 0 else sum(pbr * market_cap) / sum(market_cap) end as weighted_average_pbr, category AS category FROM "companies" GROUP BY "companies"."category";
In psql this query works fine. But I don't know how to use from Rails.
sql = %q(select case when sum(market_cap) = 0 then 0 else sum(pbr * market_cap) / sum(market_cap) end as weighted_average_pbr, category AS category FROM "companies" GROUP BY "companies"."category";)
ActiveRecord::Base.connection.select_all(sql)
returns a error:
output error: #<NoMethodError: undefined method `keys' for #<Array:0x007ff441efa618>>
It would be best if I can extend Rails method so that I can use
Company.group(:category).weighted_average(:pbr)
But I heard that extending rails query is a bit tweaky, now I just want to know how to run the result of sql from Rails.
Does anyone knows how to do it?
Version
rails: 4.2.1

What version of Rails are you using? I don't get that error with Rails 4.2. In Rails 3.2 select_all used to return an Array, and in 4.2 it returns an ActiveRecord::Result. But in either case, it is correct that there is no keys method. Instead you need to call keys on each element of the Array or Result. It sounds like the problem isn't from running the query, but from what you're doing afterward.
In any case, to get the more fluent approach you've described, you could do this:
class Company
scope :weighted_average, lambda{|col|
select("companies.category").
select(<<-EOQ)
(CASE WHEN SUM(market_cap) = 0 THEN 0
ELSE SUM(#{col} * market_cap) / SUM(market_cap)
END) AS weighted_average_#{col}
EOQ
}
This will let you say Company.group(:category).weighted_average(:pbr), and you will get a collection of Company instances. Each one will have an extra weighted_average_pbr attribute, so you can do this:
Company.group(:category).weighted_average(:pbr).each do |c|
puts c.weighted_average_pbr
end
These instances will not have their normal attributes, but they will have category. That is because they do not represent individual Companies, but groups of companies with the same category. If you want to group by something else, you could parameterize the lambda to take the grouping column. In that case you might as well move the group call into the lambda too.
Now be warned that the parameter to weighted_average goes straight into your SQL query without escaping, since it is a column name. So make sure you don't pass user input to that method, or you'll have a SQL injection vulnerability. In fact I would probably put a guard inside the lambda, something like raise "NOPE" unless col =~ %r{\A[a-zA-Z0-9_]+\Z}.
The more general lesson is that you can use select to include extra SQL expressions, and have Rails magically treat those as attributes on the instances returned from the query.
Also note that unlike with select_all where you get a bunch of hashes, with this approach you get a bunch of Company instances. So again there is no keys method! :-)

Related

Can I force the execution of an active record query chain?

I have an edge case where I want to use .first only after my SQL query has been executed.
My case is the next one:
User.select("sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count")
.first
.yield_self { |r| r.bar_count / r.foo_count.to_f }
However, this would throw an SQL error saying that I should include my user_id in the GROUP BY clause. I've already found a hacky solution using to_a, but I really wonder if there is a proper way to force execution before my call to .first.
The error is because first uses an order by statement to order by id.
"Find the first record (or first N records if a parameter is supplied). If no order is defined it will order by primary key."
Instead try take
"Gives a record (or N records if a parameter is supplied) without any implied order. The order will depend on the database implementation. If an order is supplied it will be respected."
So
User.select("sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count")
.take
.yield_self { |r| r.bar_count / r.foo_count.to_f }
should work appropriately however as stated the order is indeterminate.
You may want to use pluck which retrieves only the data instead of select which just alters which fields get loaded into models:
User.pluck(
"sum((type = 'foo')::int) as foo_count",
"sum((type = 'bar')::int) as bar_count"
).map do |foo_count, bar_count|
bar_count / foo_count.to_f
end
You can probably do the division in the query as well if necessary.

Ordering a collection by instance method

I would like to order a collection first by priority and then due time like this:
#ods = Od.order(:priority, :due_date_time)
The problem is due_date_time is an instance method of Od, so I get
PG::UndefinedColumn: ERROR: column ods.due_date_time does not exist
I have tried the following, but it seems that by sorting and mapping ids, then finding them again with .where means the sort order is lost.
#ods = Od.where(id: (Od.all.sort {|a,b| a.due_date_time <=> b.due_date_time}.map(&:id))).order(:priority)
due_date_time calls a method from a child association:
def due_date_time
run.cut_off_time
end
run.cut_off_time is defined here:
def cut_off_time
(leave_date.beginning_of_day + route.cut_off_time_mins_since_midnight * 60)
end
I'm sure there is an easier way. Any help much appreciated! Thanks.
order from ActiveRecord similar to sort from ruby. So, Od.all.sort run iteration after the database query Od.all, run a new iteration map and then send a new database query. Also Od.all.sort has no sense because where select record when id included in ids but not searching a record for each id.
Easier do something like this:
Od.all.sort_by { |od| [od.priority, od.due_date_time] }
But that is a slow solution(ods table include 10k+ records). Prefer to save column to sort to the database. When that is not possible set logic to calculate due_date_time in a database query.

includes/joins case in rails 4

I have a habtm relationship between my Product and Category model.
I'm trying to write a query that searches for products with minimum of 2 categories.
I got it working with the following code:
p = Product.joins(:categories).group("product_id").having("count(product_id) > 1")
p.length # 178
When iterating on it though, for each time I call product.categories, it will do a new call to the database - not good. I want to prevent these calls and have the same result. Doing more research I've seen that I could include (includes) my categories table and it would load all the table in memory so it's not necessary to call the database again when iterating. So I got it working with the following code:
p2 = Product.includes(:categories).joins(:categories).group("product_id").having("count(product_id) > 1")
p2.length # 178 - I compared and the objects are the same as last query
Here come's what I am confused about:
p.first.eql? p2.first # true
p.first.categories.eql? p2.first.categories # false
p.first.categories.length # 2
p2.first.categories.length # 1
Why with the includes query I get the right objects but I don't get the categories relationship right?
It has something to do with the group method. Your p2 only contains the first category for each product.
You could break this up into two queries:
product_ids = Product.joins(:categories).group("product_id").having("count(product_id) > 1").pluck(:product_id)
result = Product.includes(:categories).find(product_ids)
Yeah, you hit the database twice, but at least you don't go to the database when you're iterating.
You must know that includes doesn't play well with joins (joins will just suppress the former).
Also When you include an association ActiveRecord figures out if it'll use eager_load (with a left join) or preload (with a separate query). Includes is just a wrapper for one of those 2.
The thing is preload plays well with joins ! So you can do this :
products = Product.preload(:categories). # this will trigger a separate query
joins(:categories). # this will build the relevant query
group("products.id").
having("count(product_id) > 1").
select("products.*")
Note that this will also hit the database twice, but you will not have any O(n) query.

querying active record

i am trying to query my postgres db from rails with the following query
def is_manager(team)
User.where("manager <> 0 AND team_id == :team_id", {:team_id => team.id})
end
this basically is checking that the manager is flagged and the that team.id is the current id passed into the function.
i have the following code in my view
%td= is_manager(team)
error or what we are getting return is
#<ActiveRecord::Relation:0xa3ae51c>
any help on where i have gone wrong would be great
Queries to ActiveRecord always return ActiveRecord::Relations. Doing so essentially allows the lazy loading of queries. To understand why this is cool, consider this:
User.where(manager: 0).where(team_id: team_id).first
In this case, we get all users who aren't managers, and then we get all the non-manager users who are on team with id team_id, and then we select the first one. Executing this code will give you a query like:
SELECT * FROM users WHERE manager = 0 AND team_id = X LIMIT 1
As you can see, even though there were multiple queries made in our code, ActiveRecord was able to squish all of that down into one query. This is done through the Relation. As soon as we need to actual object (i.e. when we call first), then ActiveRecord will go to the DB to get the records. This prevents unnecessary queries. ActiveRecord is able to do this because they return Relations, instead of the queried objects. The best way to think of the Relation class is that it is an instance of ActiveRecord with all the methods of an array. You can call queries on a relation, but you can also iterate over it.
Sorry if that isn't clear.
Oh, and to solve your problem. %td = is_manager(team).to_a This will convert the Relation object into an array of Users.
Just retrieve first record with .first, this might help.
User.where("manager <> 0 AND team_id == :team_id", {:team_id => team.id}).first

Modifying the returned value of find_by_sql

So I am pulling my hair over this issue / gotcha. Basically I used find_by_sql to fetch data from my database. I did this because the query has lots of columns and table joins and I think using ActiveRecord and associations will slow it down.
I managed to pull the data and now I wanted to modify returned values. I did this by looping through the result ,for example.
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
end
What I found out is that project['mycolumn'] was not changed at all.
So my question:
Does find_by_sql return an array Hashes?
Is it possible to modify the value of one of the attributes of hash as stated above?
Here is the code : http://pastie.org/4213454 . If you can have a look at summarize_roles2() that's where the action is taking place.
Thank you. Im using Rails 2.1.1 and Ruby 1.8. I can't really upgrade because of legacy codes.
Just change the method above to access the values, print value of project and you can clearly check the object property.
The results will be returned as an array with columns requested encapsulated as attributes of the model you call this method from.If you call Product.find_by_sql then the results will be returned in a Product object with the attributes you specified in the SQL query.
If you call a complicated SQL query which spans multiple tables the columns specified by the SELECT will be attributes of the model, whether or not they are columns of the corresponding table.
Post.find_by_sql "SELECT p.title, c.author FROM posts p, comments c WHERE p.id = c.post_id"
> [#<Post:0x36bff9c #attributes={"title"=>"Ruby Meetup", "first_name"=>"Quentin"}>, ...]
Source: http://api.rubyonrails.org/v2.3.8/
Have you tried
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
project.save
end

Resources