Double join on same table produces wrong result - ruby-on-rails

Basically I have a Driver model that has many rides. Those rides has price field and I want to calculate driver's total_paid (the payment they have earned for all the time) and this_week_paid (the payment has been done only from the beginning of this week to the end of it) in one active record query.
I have achieved the correct number for total_paid part easily with one join like this:
Driver.joins(:rides).
select("#{Driver.table_name}.*, sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
group("#{Driver.table_name}.id").
order("total_paid DESC, id")
Now when I try to add this_week_paid to that query:
Driver.joins("INNER JOIN rides this_week_rides ON #{Driver.table_name}.id = this_week_rides.driver_id").
joins("INNER JOIN rides all_rides ON #{Driver.table_name}.id = all_rides.driver_id").
select("#{Driver.table_name}.*, " +
"sum(substring(this_week_rides.price from '[0-9]+.[0-9]*')::numeric) as this_week_paid, " +
"sum(substring(all_rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
where(this_week_rides: { created_at: Time.current.beginning_of_week..Time.current.end_of_week }).
group("#{Driver.table_name}.id").
order("this_week_paid DESC, id")
It runs without throwing any exceptions however, interestingly the total_paid field is two times of correct number and this_week_paid field is three times of the correct one ( Query answer: { this_week_paid: 188.46, total_paid: 159.9 }, the correct answer: { this_week_paid: 62.82, total_paid: 79.95 } ).
I did try to add where("this_week_rides.id != all_rides.id") and it gives me another wrong result ("this_week_paid" => 125.64,"total_paid" => 97.08)
What am I missing?

You join the same table twice and that will multiply the number of rows you get so that is why you get multiples of the expected result. Just join it once and filter in the select like this:
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) filter (
where rides.created_at between time1 and time2
) as this_week_paid,
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid

Related

Multiplying Columns in Active Record?

Have the following sql query, how would I translate the multiplication of two columns in Active Record?
SELECT plan_name, plan_price, count(plan_id), plan_price*count(plan_id) AS totalrevenue
FROM leads
INNER JOIN plans p ON leads.plan_id = p.id
WHERE lead_status_id = 5
GROUP BY plan_name, plan_price;
I'd try something like this.
leads = Lead.joins(:plans).
where(lead_status_id: 5).
group(:plan_name, :plan_price).
select('plan_name, plan_price, count(plan_id), plan_price * count(plan_id) AS totalrevenue')
leads.each { |lead| puts lead.totalrevenue }
Note:
I don't know if it's joins(:plan) or joins(:plans), it depends on your associations
If you need to call even the plan_id count I'd give a name even to that field
A field like totalrevenue is not displaied if you print the entire object, because it's not part of it, it's a virtual attribute

Get the average of the most recent records within groups with ActiveRecord

I have the following query, which calculates the average number of impressions across all teams for a given name and league:
#all_team_avg = NielsenData
.where('name = ? and league = ?', name, league)
.average('impressions')
.to_i
However, there can be multiple entries for each name/league/team combination. I need to modify the query to only average the most recent records by created_at.
With the help of this answer I came up with a query which gets the result that I need (I would replace the hard-coded WHERE clause with name and league in the application), but it seems excessively complicated and I have no idea how to translate it nicely into ActiveRecord:
SELECT avg(sub.impressions)
FROM (
WITH summary AS (
SELECT n.team,
n.name,
n.league,
n.impressions,
n.created_at,
ROW_NUMBER() OVER(PARTITION BY n.team
ORDER BY n.created_at DESC) AS rowcount
FROM nielsen_data n
WHERE n.name = 'Social Media - Twitter Followers'
AND n.league = 'National Football League'
)
SELECT s.*
FROM summary s
WHERE s.rowcount = 1) sub;
How can I rewrite this query using ActiveRecord or achieve the same result in a simpler way?
When all you have is a hammer, everything looks like a nail.
Sometimes, raw SQL is the best choice. You can do something like:
#all_team_avg = NielsenData.find_by_sql("...your_sql_statement_here...")

Getting Conditional Count in Join with Laravel Query Builder

I am trying to achieve the following with Laravel Query builder.
I have a table called deals . Below is the basic schema
id
deal_id
merchant_id
status
deal_text
timestamps
I also have another table called merchants whose schema is
id
merchant_id
merchant_name
about
timestamps
Currently I am getting deals using the following query
$deals = DB::table('deals')
-> join ('merchants', 'deals.merchant_id', '=', 'merchants.merchant_id')
-> where ('merchant_url_text', $merchant_url_text)
-> get();
Since only 1 merchant is associated with a deal, I am getting deals and related merchant info with the query.
Now I have a 3rd table called tbl_deal_votes. Its schema looks like
id
deal_id
vote (1 if voted up, 0 if voted down)
timestamps
What I want to do is join this 3rd table (on deal_id) to my existing query and be able to also get the upvotes and down votes each deal has received.
To do this in a single query you'll probably need to use SQL subqueries, which doesn't seem to have good fluent query support in Laravel 4/5. Since you're not using Eloquent objects, the raw SQL is probably easiest to read. (Note the below example ignores your deals.deal_id and merchants.merchant_id columns, which can likely be dropped. Instead it just uses your deals.id and merchants.id fields by convention.)
$deals = DB::select(
DB::raw('
SELECT
deals.id AS deal_id,
deals.status,
deals.deal_text,
merchants.id AS merchant_id,
merchants.merchant_name,
merchants.about,
COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count,
COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count
FROM
deals
JOIN merchants ON (merchants.id = deals.merchant_id)
LEFT JOIN (
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1 && deal_id
GROUP BY deal_id
) tbl_upvotes ON (tbl_upvotes.deal_id = deals.id)
LEFT JOIN (
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id
) tbl_downvotes ON (tbl_downvotes.deal_id = deals.id)
')
);
If you'd prefer to use fluent, this should work:
$upvotes_subquery = '
SELECT deal_id, count(*) AS upvotes_count
FROM tbl_deal_votes
WHERE vote = 1
GROUP BY deal_id';
$downvotes_subquery = '
SELECT deal_id, count(*) AS downvotes_count
FROM tbl_deal_votes
WHERE vote = 0
GROUP BY deal_id';
$deals = DB::table('deals')
->select([
DB::raw('deals.id AS deal_id'),
'deals.status',
'deals.deal_text',
DB::raw('merchants.id AS merchant_id'),
'merchants.merchant_name',
'merchants.about',
DB::raw('COALESCE(tbl_upvotes.upvotes_count, 0) AS upvotes_count'),
DB::raw('COALESCE(tbl_downvotes.downvotes_count, 0) AS downvotes_count')
])
->join('merchants', 'merchants.id', '=', 'deals.merchant_id')
->leftJoin(DB::raw('(' . $upvotes_subquery . ') tbl_upvotes'), function($join) {
$join->on('tbl_upvotes.deal_id', '=', 'deals.id');
})
->leftJoin(DB::raw('(' . $downvotes_subquery . ') tbl_downvotes'), function($join) {
$join->on('tbl_downvotes.deal_id', '=', 'deals.id');
})
->get();
A few notes about the fluent query:
Used the DB::raw() method to rename a few selected columns.
Otherwise, there would have been a conflict between deals.id
and merchants.id in the results.
Used COALESCE to default null votes to 0.
Split the subqueries into separate PHP strings to improve readability.
Used left joins for the subqueries so deals with no upvotes/downvotes still show up.

How to build inner join in Rails with conditions?

I've a model StockUpdate which keeps track of stocks for every product for a store. Table attributes are: :product_id, :stock, :store_id. I was trying to find out last entry for every product for a given store. According to that I build my query in PGAdmin which is given below and it's working fine. I'm new in Rails and I don't know how to represent it in Model. Please help.
SELECT a.*
FROM stock_updates a
INNER JOIN
(
SELECT product_id, MAX(id) max_id
FROM stock_updates where store_id = 9 and stock > 0
GROUP BY product_id
) b ON a.product_id = b.product_id AND
a.id = b.max_id
I does not clearly understand what you want to do, but I think you can do something like this:
class StockUpdate < ActiveRecord::Base
scope :a_good_name, -> { joins(:product).where('store_id = ? and stock > ?', 9, 0) }
end
You can all call StoclUpdate.a_good_name.explain to check the generated sql
What you need is really simple and can be easily accomplished with 2 queries. Otherwise it becomes very complicated in a single query (it's still doable though):
store_ids = [0, 9]
latest_stock_update_ids = StockUpdate.
where(store_id: store_ids).
group(:product_id).
maximum(:id).
values
StockUpdate.where(id: latest_stock_update_ids)
Two queries, without any joins necessary. The same could be possible with a single query too. But like your original code, it would include subqueries.
Something like this should work:
StockUpdate.
where(store_id: store_ids).
where("stock_updates.id = (
SELECT MAX(su.id) FROM stock_updates AS su WHERE (
su.product_id = stock_updates.product_id
)
)
")
Or perhaps:
StockUpdate.where("id IN (
SELECT MAX(su.id) FROM stock_updates AS su GROUP BY su.product_id
)")
And to answer your original question, you can manually specify a joins like so:
Model1.joins("INNER JOINS #{Model2.table_name} ON #{conditions}")
# That INNER JOINS can also be LEFT OUTER JOIN, etc.

Emulating an interval join in hive

I am using hive 0.13.
I have two tables:
data table. columns: id, time. 1E10 rows.
mymap table. columns: id, name, start_time, end_time. 1E6 rows.
For each row in the data table I want to get the name from the mymap table matching the id and the time interval. So I want to do a join like:
select data.id, time, name from data left outer join mymap on data.id = mymap.id and time>=start_time and time<end_time
It is known that for every row in data there are 0 or 1 matches in mymap.
The above query is not supported in hive as it is a non-equi-join. Moving the inequality conditions into a where filter does not work cause the join explodes before the filter is applied:
select data.id, time, name from data left outer join mymap on data.id = mymap.id where mymap.id is null or (time>=start_time and time<end_time)
(I am aware that the queries are not exactly equivalent due to cases where there is a match for id but no matching interval. This can be solved as I describe here: Hive: work around for non equi left join)
How can I go about this?
You could perform your join and then query from that table. I didn't test this code, but it would read something like
select id
,time
,name
from (
select d.id
,d.time
,m.name
,m.start_time
,m.end_time
from data as d LEFT OUTER JOIN mymap as m
ON d.id = m.id
) x
where time>=start_time
AND time<end_time
You could potentially get around this issue by flattening out the data structure in table2 and using a UDF to process the joined records.
select
id,
time,
nameFinderUDF(b.name_list, time) as name
from
data a
LEFT OUTER JOIN
(
select
id,
collect_set(array(name,cast(start_time as string),cast(end_time as string))) as name_list
from
mymap
group by
id
) b
ON (a.id=b.id)
With a UDF that does something like:
public String evaluate(ArrayList<ArrayList<String>> name_list,Long time) {
for (int i;i<name_list.length;i++) {
if (time >= Long.parseLong(name_list[i][1]) && time <= Long.parseLong(name_list[i][2])) {
return name_list[i][0]
return null;
}
This approach should make the merge 1 to 1, but it could create a fairly large data structure repeated many times. It is still quite a bit more efficient than a straight join.

Resources