I have a rails 4 (ruby 2) app that tracks time for employees against various companies. I need to get a sum of the minutes per company per date. My problem is I'm not sure the best way to pad date/company pairs with 0 if there are no time entries for that company on that day.
Tables
Companies Time_Entries
id name ... id, created_at, company_id, minutes ...
Current output given only 2 companies and 2 days,
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500}]
Expected output to do is pad days that aren't recorded with 0's is to have an additional item in the list where the last item is the new item.
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500},
{"company_id":2,"company_name":"Company B","date":"2013-06-25","minutes":0}]
Current Query (PostgreSQL)
#minutes = TimeEntry.where("created_at >= ?", 1.week.ago.utc)
.group('companies.id, date(created_at)')
.joins(:company)
.select("companies.id as company_id", "companies.name as company_name", "date(created_at)", "SUM(minutes) as minutes")
.order("date ASC")
I'm not sure the best way to go about this. I can think of a couple options:
A 3 deep loop that loops through days, than a loop through companies, than a loop through found results to add any day/company pairs that have not already been added.
Do a left join on a generate_series() for a date range in postgresq and coalesce null sums to 0, but I don't think that will get me all the way
Some unknown better more elegant option
Related
I have a model with the fields "date" and "frequency" (Frequency is an integer). I'm trying to get the top 5 frequencies per date.
Essentially I want to group by date, then get the top 5 per group.
What I have so far only retrieves the top 1 in the group:
Observation.channel("channelOne").order('date', 'frequency desc').group(:date).having('frequency = MAX(frequency)')
I want the MAX(frequency) PLUS the second, third, fourth and fifth largest PER DATE.
Sorry if this is really simple or if my terminology is off; I've just started with rails :)
You can use this:
Observation
.select("obs1.*")
.from("observations obs1")
.joins("LEFT JOIN observations AS obs2 ON obs1.date = obs2.date AND obs1.frequency <= obs2.frequency")
.group("obs1.date, obs1.id")
.having("count(*) <= 5")
.order("obs1.date, obs2.frequency")
This query returns the top 5 frequencies for each date.
I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.
I have a model with following columns
Charges Model
Date
fee
discount
Data
1/1/15, 1, 1
1/1/15, 2, 1
2/2/15, 3, 3
I have a few named scopes like this_year
I want to do something like Charges.this_year.summed_up
How do I make a named scope for this.
The returned response then should be:
1/1/15, 3, 2
2/2/15, 3, 3
Assuming you have a model with a date field(eg. published_at) and 2 integer fields(eg. fee, discount). You can use "group" method to run GROUP BY on published_at. Then just use sum method if you want only sum of one fields. If you want more than one field, you have to run a select with SQL SUMs inside, to get multiple column sums. Here is an example.
Charge..group(published_at)
.select("published_at, SUM(fee) AS sum_fee, SUM(discount) AS sum_discount")
.order("published_at")
Note: Summarized fields won't show up in rails console return value prompt. But they are there for you to use.
Depending upon what end result you want, you may want to look at .group(:attribute) rather than .group_by:
Charge.group(:date).each do |charge|
charge.where('date = ?', charge.date).sum(:fee)
charge.where('date = ?', charge.date).sum(:discount)
end
I found this approach easier, especially if setting multiple conditions on the data you want to extract from the table.
In any case, I had an accounting model that presented this kind of issue where I needed credit and debit plus type of payment info on a single table and spent a fruitful few hours learning all about group_by before realizing that .group() offered a simple solution.
So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))
I'm calculating total "walk time" for dog walking app. The Walks table has two cols, start_time and end_time. Since I want to display total time out for ALL walks for a particular dog, I should just be able to sum the two columns, subtract end_times_total from start_time_totals and result will be my total time out. However I'm getting strange results. When I sum the columns thusly,
start_times = dog.walks.sum('start_time')
end_times = dog.walks.sum('end_time')
BOTH start_times and end_times return the same value. Doing a sanity check I see that my start and end times in the db are indeed set as I would expect them to be (start times in the morning, end times in the afternoon), so the sum should definitely return a different value for each of the columns. Additionally, the value is different for each dog and in line with the relative values I would expect, so dogs with more walks return larger values than dogs with fewer walks. So, it looks like the sum is probably working, only somehow returning the same value for each column.
Btw, running this in dev Rails 3.2.3, ruby 2.0, SQLite.
Don't think that summing datetimes is a good idea. What you need is calculate duration of each single walk and sum them. You can do it in 2 ways:
1. DB-dependent, but more efficient:
# sqlite in dev and test modes
sql = "strftime('%s',end_time) - strftime('%s',start_time)" if !Rails.env.production?
# production with postgres
sql = "extract(epoch from end_time - start_time)" if Rails.env.production?
total = dog.walks.sum(sql)
2. DB-agnostic, but less efficient in case of hundreds record for each dog:
total = dog.walks.all.inject(0) {|tot,w| tot+=w.end_time-w.start_time}
I don't know, how sqlite handles datetime and operations on this data type, but while playing in sqlite console, I noticed that I could get reliable effects when converting datetime to seconds.
I would write it like:
dog.walks.sum("strftime('%s', end_time) - strftime('%s', start_time)")
Query should look like:
select sum(strftime('%s', end_time) - strftime('%s', start_time)) from walks;