Get top N items per group in Ruby on Rails - ruby-on-rails

I have a model with the fields "date" and "frequency" (Frequency is an integer). I'm trying to get the top 5 frequencies per date.
Essentially I want to group by date, then get the top 5 per group.
What I have so far only retrieves the top 1 in the group:
Observation.channel("channelOne").order('date', 'frequency desc').group(:date).having('frequency = MAX(frequency)')
I want the MAX(frequency) PLUS the second, third, fourth and fifth largest PER DATE.
Sorry if this is really simple or if my terminology is off; I've just started with rails :)

You can use this:
Observation
.select("obs1.*")
.from("observations obs1")
.joins("LEFT JOIN observations AS obs2 ON obs1.date = obs2.date AND obs1.frequency <= obs2.frequency")
.group("obs1.date, obs1.id")
.having("count(*) <= 5")
.order("obs1.date, obs2.frequency")
This query returns the top 5 frequencies for each date.

Related

How to count bookings if allowing for efficient overlap

I've looked over some SO discussions here and, well at least I haven't seen this perspective. I'm trying to write code to count bookings of a given resource, where I want to find the MINIMUM number of resources I need to fulfill all bookings.
Let's use an example of hotel rooms. Given that I have the following bookings
Chris: July 4-July 17
Pat: July 15-July 19
Taylor: July 10-July 11
Chris calls and would like to add some room(s) to their reservation for friends, and wonders how many rooms I have available.
Rooms_available = Rooms_in_hotel - Rooms_booked
The Rooms_booked is where I'm having trouble. It seems like most questions (and indeed my code) just looks at overlapping dates. So it would do something like this:
Booking.where("booking_end >= :start_check AND booking_start <= :end_check", { start_check: "July 4, 2021".to_date, end_check: "July 7, 2021".to_date})
This code would return 3. Which means that if the hotel theoretically had 5 rooms, I would tell Chris that there were 2 more rooms left available.
However, while this method of counting is technically accurate, it misses the possibility of an efficient overlap. Namely that since Taylor checks out 4 days before Pat, they can both be "assigned" the same room. So technically, I can offer 3 more rooms to Chris.
So my question is how do I more accurately calculate Rooms_booked allowing for efficient overlap (i.e., efficient resource allocation)? Is there a query using ActiveRecord or what calculation do I impose on top of the existing query?
i don't think just only a query could solve your problem (or very very complex query).
my idea is group (and count) by (booking_start, booking_end) in order booking_start asc then reassign, e.g. if there're 2 bookings July 15-July 19 and 3 bookings July 10-July 11 then we only could re-assign for 2 pairs, and we need 3 rooms (2 rooms for July 15-July 19-July 10-July 11 and 1 for July 10-July 11.
and re-assign in code not query (we can optimize by pick a narrow range of time)
# when Chris calls and would like to add some room(s) to their reservation for friends,
# and wonders how many available rooms.
# pick (start_time, end_time) so that the range time long enough that
# including those booking {n} month ago
# but no need too long or all the time
scope :booking_available_count, -> (start_time, end_time) {
group_by_time = \
Booking.where("booking_start >= ? AND booking_end <= ?", start_time, end_time)
.group(:booking_start, :booking_end).order(:booking_start).count
# result: {[booking_start, booking_end] => 1, [booking_start, booking_end] => 2, ... }
# in order booking_start ASC
# then we could re-assign from left to right as below
booked = 0
group_by_time.each do |(start_time, end_time), count|
group_by_time.each do |(assign_start_time, assign_end_time), assign_count|
next if end_time > assign_start_time
count -= assign_count # re-assign
break if count <= 0
end
booked += count if count > 0
end
# return number of available rooms
# allow negative number
Room.count - booked
}

Rails: how to calculate the average of a small set of elements

I've looked at resources to know how to find the average with RoR built-in average ActiveRecord::Calculations. I've also looked online for ideas on how to calculate averages: Rails calculate and display average.
But cannot find any reference on how to calculate the average of a set of elements from the database column.
In the controller:
#jobpostings = Jobposting.all
#medical = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(4).order('max_salary DESC')
#medical_salary = "%.2f" % #medical.average(:max_salary).truncate(2)
#medical returns:
CITY MEDICAL SPECIALIST
220480.0
CITY MEDICAL EXAMINER (OCME)
180000.0
CITY MEDICAL SPECIALIST
158080.0
CITY MEDICAL SPECIALIST
130000.0
I want to find the average of :max_salary (each one is listed correctly under the job title). I use #medical_salary = "%.2f" % #medical.average(:max_salary).truncate(2) to convert the BigDecimal number and find the average of :max_salary from #medical which I thought would be limited to the top 4 displayed above.
But the result returned is: 72322.33, which is the average of the entire column (I checked), instead of the top 4.
Do I need to add another condition? Why does the average of #medical return the average of the entire column?
Any insight would help. Thanks.
#medical.average(:max_salary) is expanded as #jobpostings.where(...).average(:max_salary).limit(4) even though the limit(4) appears previously in the method chain.
You can confirm this by checking the query that is run which be as follows:
SELECT AVG(`jobpostings `.`max_salary `) FROM `tickets` ... LIMIT 4`
Effectively, LIMIT 4 doesn't do anything because there is only one average number in the result of the above query.
One way to accomplish what you are trying to do will be:
$ #top_salaries = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(4).map(&:max_salary)
=> [ 220480.0, 180000.0, 158080.0, 130000.0]
$ average = #top_salaries.reduce(:+) / #top_salaries.size
=> 172140.0
#medical.average(:max_salary)
This line corresponds to following MySQL query:
SELECT AVG(`jobpostings`.`max_salary`) AS avg_max_salary FROM `jobpostings` WHERE (`jobpostings`.`title` like "%MEDICAL SPECIALIST%" OR title like "%MEDICAL EXAMINER%") LIMIT 4
Since MySQL already calculated AVG on the column max_salary, it returns only 1 row. LIMIT 4 clause doesn't actually come into play.
You can try following:
#limit = 4
#medical = #jobpostings.where("title like ? OR title like ?", "%MEDICAL SPECIALIST%", "%MEDICAL EXAMINER%").limit(#limit).order('max_salary DESC')
#medical_salary = "%.2f" %(#medical.pluck(:max_salary).reduce(&:+)/#limit.to_f).truncate(2)
It will even work for cases when no. of results are 0.

Payout Algorithm Ruby

Im building a rails app that has users and scores. I want the top half of the users to get paid out. I have a separate tiebreaker input stored for each user if they get happen to tie for last place (last paid out place). For example, I need help, if their are 8 users and 4th and 5th tie in points. Then it calls my tiebreaker.
This is what I have tried:
First I am counting the users and determening the top half of the players:
theUsersCount = ParticipatingUser.where(game_id: game_id).size
numofWinners = theUsersCount / 2
Then I am taking the users and their scores and pushing it to an array then only showing the top half of the users that won.
userscores.push("#{user.username}" => playerScore})
userscores[0..numofWinners].sort_by { |y| y[:score] }
But I am unsure of how to take execute the tiebreaker if their is a tie for last place.
To get the users count you should use count rather than size - size fetches all the rows, then counts them, while count counts the rows in the DB, and returns the number:
user_count = ParticipatingUser.where(game_id: game_id).count
(actually - the above is wrong - here is an explanation - you should use size which smartly chooses between length and count - thanks #nzifnab)
Now, find the score of the user in the user_count/2 place
minimal_score = ParticipatingUser.order(:score, :desc).pluck(:score).take(user_count/2).last
And take all the users with this score or more:
winning_users = ParticipatingUser.where('score >= ?', minimal_score).order(:score, :desc)
now check if there are more users than expected:
if winning_users.size > user_count/2
then break your ties:
tie_breaker(winning_users[user_count/2-1..-1])
All together:
user_count = ParticipatingUser.where(game_id: game_id).size
minimal_score = ParticipatingUser.order(:score, :desc).pluck(:score).take(user_count/2).last
winning_users = ParticipatingUser.where('score >= ?', minimal_score).order(:score, :desc)
if winning_users.size > user_count/2
losers = tie_breaker(winning_users[user_count/2-1..-1])
winning_users -= losers
end
winning_users

Sybase compare columns with duplicate row ids

So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))

Pad summed results with 0

I have a rails 4 (ruby 2) app that tracks time for employees against various companies. I need to get a sum of the minutes per company per date. My problem is I'm not sure the best way to pad date/company pairs with 0 if there are no time entries for that company on that day.
Tables
Companies Time_Entries
id name ... id, created_at, company_id, minutes ...
Current output given only 2 companies and 2 days,
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500}]
Expected output to do is pad days that aren't recorded with 0's is to have an additional item in the list where the last item is the new item.
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500},
{"company_id":2,"company_name":"Company B","date":"2013-06-25","minutes":0}]
Current Query (PostgreSQL)
#minutes = TimeEntry.where("created_at >= ?", 1.week.ago.utc)
.group('companies.id, date(created_at)')
.joins(:company)
.select("companies.id as company_id", "companies.name as company_name", "date(created_at)", "SUM(minutes) as minutes")
.order("date ASC")
I'm not sure the best way to go about this. I can think of a couple options:
A 3 deep loop that loops through days, than a loop through companies, than a loop through found results to add any day/company pairs that have not already been added.
Do a left join on a generate_series() for a date range in postgresq and coalesce null sums to 0, but I don't think that will get me all the way
Some unknown better more elegant option

Resources