Active Record query Distinct Group by - ruby-on-rails

I have a query, where I am trying to find min score of a user in a grade, in a grade, there are users with the same min score
Example: User A has a score of 2 and user B has a score of 2, so my expectation is to get both the users grouped by grade.
However, I am only getting one user. The query is :
users = Users.all
#user_score = users
.where.not(score: [ nil, 0 ])
.select('DISTINCT ON ("users"."grade") grade, "users".*')
.order('"users"."grade" ASC, "users"."score" ASC')
.group_by(&:grade)
Please if some can guide me what am i doing wrong here.

DISTINCT will cut off all non uniq values in the result, so there is no way to get multiple users with same min score in your query.
I think you can achieve the desired result with window function:
SELECT * FROM
(SELECT *, rank() OVER (PARTITION BY grade ORDER BY score) AS places
FROM users
WHERE score IS NOT NULL AND score != 0
) AS ranked_by_score
WHERE places = 1;

Related

Double join on same table produces wrong result

Basically I have a Driver model that has many rides. Those rides has price field and I want to calculate driver's total_paid (the payment they have earned for all the time) and this_week_paid (the payment has been done only from the beginning of this week to the end of it) in one active record query.
I have achieved the correct number for total_paid part easily with one join like this:
Driver.joins(:rides).
select("#{Driver.table_name}.*, sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
group("#{Driver.table_name}.id").
order("total_paid DESC, id")
Now when I try to add this_week_paid to that query:
Driver.joins("INNER JOIN rides this_week_rides ON #{Driver.table_name}.id = this_week_rides.driver_id").
joins("INNER JOIN rides all_rides ON #{Driver.table_name}.id = all_rides.driver_id").
select("#{Driver.table_name}.*, " +
"sum(substring(this_week_rides.price from '[0-9]+.[0-9]*')::numeric) as this_week_paid, " +
"sum(substring(all_rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid").
where(this_week_rides: { created_at: Time.current.beginning_of_week..Time.current.end_of_week }).
group("#{Driver.table_name}.id").
order("this_week_paid DESC, id")
It runs without throwing any exceptions however, interestingly the total_paid field is two times of correct number and this_week_paid field is three times of the correct one ( Query answer: { this_week_paid: 188.46, total_paid: 159.9 }, the correct answer: { this_week_paid: 62.82, total_paid: 79.95 } ).
I did try to add where("this_week_rides.id != all_rides.id") and it gives me another wrong result ("this_week_paid" => 125.64,"total_paid" => 97.08)
What am I missing?
You join the same table twice and that will multiply the number of rows you get so that is why you get multiples of the expected result. Just join it once and filter in the select like this:
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) filter (
where rides.created_at between time1 and time2
) as this_week_paid,
sum(substring(rides.price from '[0-9]+.[0-9]*')::numeric) as total_paid

rails group order by count

I have five tables: users, interests, animals, interests_animals, interests_users.
User
foo
Interest 1, 2, 3
Animal 1, 2, 3
foo has interests 1, 2, 3
Interest 1 has Animal 1, 2
Interest 2 has Animal 1, 2, 3
Interest 3 has Animal 3
I need return all animals through interests grouped for interest id of foo ordered by animals count
I trying like this:
SELECT animals.* FROM animals
INNER JOIN interests_animals ON animals.id = interests_animals.animal_id
INNER JOIN interests ON interests_animals.interest_id = interests.id
INNER JOIN interests_users ON interests.id = interests_users.interest_id
WHERE interests_users.user_id = XXX
GROUP BY animals.id
ORDER BY COUNT(interests_animals.animal_id);
I need that the animals are returned in orders 2, 1, 3, but always returning 1,2,3
You need explicitly specify column(s), on which you do GROUP BY in SELECT clause.
All other parts of SELECT clause must be aggregates like count(), sum(), etc.
Notice, that we use count(distinct ..) here because each animal ID might appear multiple times due to the chain of JOINs:
SELECT
interests.id,
COUNT(DISTINCT animals.id) as animals_count
JOIN interests_animals ON animals.id = interests_animals.animal_id
JOIN interests ON interests_animals.interest_id = interests.id
JOIN interests_users ON interests.id = interests_users.interest_id
WHERE interests_users.user_id = XXX
GROUP BY 1
ORDER BY 2 desc;
-- in GROUP BY and ORDER BY, it is usually convenient to use just numbers -- "1" means "the 1st column of SELECT clause", etc.
Also, "INNER" is an optional keyword (simply "JOIN" and "INNER JOIN" are the same thing).
Also, as a side note, you might found useful to add this to your SELECT clause:
, array_agg(animals.id order by animals.id) as animal_ids
-- this will give you integer array of all animal IDs that relate to a particular interest, ordered.

Mysql matchmaking / pairing

I'm currently working on a 1v1 online game and I ran into a problem when trying to match up players.
A player who wants to play gets put into a matchmaking table
id, user, amount
Now I want to query the table matchmaking for the best possible pairs of users (So, users who want to play for the same amount)
I also want users who are waiting for a longer time (smaller id), to be paired up first.
So far I have this query:
SELECT *
FROM matchmaking a, wpr_matchmaking b
WHERE a.user != b.user
AND a.amount = b.amount
ORDER BY a.id ASC , b.id ASC
LIMIT 0 , 30
This returns all possible pairings, so in a table with this content:
id, user, amount
1, 1, 10
2, 2, 10
3, 3, 10
I get the pairs:
1,2
1,3
2,1
2,3
3,1
3,2
Whereas I only want 1,2 returned in that case.
How do I make it only show me each user at most once?
Edit: adding the condition 'and a.id < b.id' to the query reduces the pairings by a factor of 2, but there's still too many.
Do you just want the highest pair to match those and then rerun the query? You could use SELECT TOP 1

How to use joins and averages together in Hive queries

I have two tables in hive:
Table1: uid,txid,amt,vendor Table2: uid,txid
Now I need to join the tables on txid which basically confirms a transaction is finally recorded. There will be some transactions which will be present only in Table1 and not in Table2.
I need to find out number of avg of transaction matches found per user(uid) per vendor. Then I need to find the avg of these averages by adding all the averages and divide them by the number of unique users per vendor.
Let's say I have the data:
Table1:
u1,120,44,vend1
u1,199,33,vend1
u1,100,23,vend1
u1,101,24,vend1
u2,200,34,vend1
u2,202,32,vend2
Table2:
u1,100
u1,101
u2,200
u2,202
Example For vendor vend1:
u1-> Avg transaction find rate = 2(matches found in both Tables,Table1 and Table2)/4(total occurrence in Table1) =0.5
u2 -> Avg transaction find rate = 1/1 = 1
Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75
Required output:
vend1,0.75
vend2,1
I can't seem to find count of both matches and occurrence in just Table1 in one hive query per user per vendor. I have reached to this query and can't find how to change it further.
SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid
Any help would be great.
I think you are running into trouble with your JOIN. When you JOIN by txid and uid, you are losing the total number of uid's per group. If I were you I would assign a column of 1's to table2 and name the column something like success or transaction and do a LEFT OUTER JOIN. Then in your new table you will have a column with the number 1 in it if there was a completed transaction and NULL otherwise. You can then do a case statement to convert these NULLs to 0
Query:
select vendor
,(SUM(avg_uid) / COUNT(uid)) as avg_of_avgs
from (
select vendor
,uid
,AVG(complete) as avg_uid
from (
select uid
,txid
,amt
,vendor
,case when success is null then 0
else success
end as complete
from (
select A.*
,B.success
from table1 as A
LEFT OUTER JOIN table2 as B
ON B.txid = A.txid
) x
) y
group by vendor, uid
) z
group by vendor
Output:
vend1 0.75
vend2 1.0
B.success in line 17 is the column of 1's that I put int table2 before the JOIN. If you are curious about case statements in Hive you can find them here
Amazing and precise answer by GoBrewers14!! Thank you so much. I was looking at it from a wrong perspective.
I made little changes in the query to get things finally done.
I didn't need to add a "success" colummn to table2. I checked B.txid in the above query instead of B.success. B.txid will be null in case a match is not found and be some value if a match is found. That checks the success & failure conditions itself without adding a new column. And then I set NULL as 0 and !NULL as 1 in the part above it. Also I changed some variable names as hive was finding it ambiguous.
The final query looks like :
select vendr
,(SUM(avg_uid) / COUNT(usrid)) as avg_of_avgs
from (
select vendr
,usrid
,AVG(complete) as avg_uid
from (
select usrid
,txnid
,amnt
,vendr
,case when success is null then 0
else 1
end as complete
from (
select A.uid as usrid,A.vendor as vendr,A.amt as amnt,A.txid as txnid
,B.txid as success
from Table1 as A
LEFT OUTER JOIN Table2 as B
ON B.txid = A.txid
) x
) y
group by vendr, usrid
) z
group by vendr;

Ranking position

I have a Rails application with the following models:
User
Bet
User has many_bets and Bets belongs_to User. Every Bet has a Profitloss value, which states how much the User has won/lost on that Bet.
So to calculate how much a specific User has won overall I cycle through his bets in the following way:
User.bets.sum(:profitloss)
I would like to show the User his ranking compared to all the other Users, which could look something like this:
"Your overall ranking: 37th place"
To do so I need to sum up the overall winnings per User, and find out in which position the current user is.
How do I do that and how to do it, so it don't overload the server :)
Thanks!
You can try something similar to
User.join(:bets).
select("users.id, sum(bets.profitloss) as accumulated").
group("users.id").
order("accumulated DESC")
and then search in the resulting list of "users" (not real users, they have only two meaningful attributes, their ID and a accumulated attribute with the sum), for the one corresponding to the current one.
In any case to get a single user's position, you have to calculate all users' accumulated, but at least this is only one query. Even better, you can store in the user model the accumulated value, and query just it for ranking.
If you have a large number of Users and Bets, you won't be able to compute and sort the global profitloss of each user "on demand", so I suggest that you use a rake task that you schedule regularly (once a day, every hour, etc...)
Add a column position in the User model, get the list of all Users, compute their global profitloss, sort the list of Users with their profitloss, and finally update the position attribute of each User with their position in the list.
Best way to do it is to keep a pre calculated total in your database either on user model itself or on a separate model that has 1:1 relation to user. If you don't do this, you will have to calculate sum for all users at all times in order to get their rating, which means full table operation on bets table. This said, this query will give you desired results, if more than 1 person has the same total, it will count both as rating X:
select id, (select count(h.id) from users u inner join
(select user_id, sum(profitloss) as `total` from bets group by user_id) b2
on b2.user_id = u.id, (select id from users) h inner join
(select user_id, sum(profitloss) as `total` from bets group by user_id) b
on b.user_id = h.id where u.id = 1 and (b.total > b2.total))
as `rating` from users where id = 1;
You will need to plug user.id into query in where id = X
if you add a column to user table to keep track of their total, query is a little simpler, in this example column name is total_profit_loss:
select id, total_profit_loss, (select count(h.username)+1 from users u,
(select username, score from users) h
where id = 1 and (h.total_profit_loss > u.total_profit_loss))
as `rating` from users where id = 1;

Resources