I'm calculating total "walk time" for dog walking app. The Walks table has two cols, start_time and end_time. Since I want to display total time out for ALL walks for a particular dog, I should just be able to sum the two columns, subtract end_times_total from start_time_totals and result will be my total time out. However I'm getting strange results. When I sum the columns thusly,
start_times = dog.walks.sum('start_time')
end_times = dog.walks.sum('end_time')
BOTH start_times and end_times return the same value. Doing a sanity check I see that my start and end times in the db are indeed set as I would expect them to be (start times in the morning, end times in the afternoon), so the sum should definitely return a different value for each of the columns. Additionally, the value is different for each dog and in line with the relative values I would expect, so dogs with more walks return larger values than dogs with fewer walks. So, it looks like the sum is probably working, only somehow returning the same value for each column.
Btw, running this in dev Rails 3.2.3, ruby 2.0, SQLite.
Don't think that summing datetimes is a good idea. What you need is calculate duration of each single walk and sum them. You can do it in 2 ways:
1. DB-dependent, but more efficient:
# sqlite in dev and test modes
sql = "strftime('%s',end_time) - strftime('%s',start_time)" if !Rails.env.production?
# production with postgres
sql = "extract(epoch from end_time - start_time)" if Rails.env.production?
total = dog.walks.sum(sql)
2. DB-agnostic, but less efficient in case of hundreds record for each dog:
total = dog.walks.all.inject(0) {|tot,w| tot+=w.end_time-w.start_time}
I don't know, how sqlite handles datetime and operations on this data type, but while playing in sqlite console, I noticed that I could get reliable effects when converting datetime to seconds.
I would write it like:
dog.walks.sum("strftime('%s', end_time) - strftime('%s', start_time)")
Query should look like:
select sum(strftime('%s', end_time) - strftime('%s', start_time)) from walks;
Related
Relatively new to SQL and ORM.
Let's say I have a database table with start_at and finish_at fields (both datetime). Table contains 10000 items for example.
How to calculate the average, max/min duration (finish_at - start_at) using ruby or Active Record tools? There is no need to write it somewhere, just need numbers.
table = MyClass.arel_table
duration = table[:finish_at] - table[:start_at]
MyClass.pick(duration.maximum, duration.minimum, duration.average)
#=> ["125 days 20:46:34.05816", "00:00:00.063579", "20 days 23:30:16.221092"]
Cast them to another data type as needed or use directly in other queries.
In my Expense model I have a date attribute called payment_date. This is a Date format and not DateTime.
In one of my views Im displaying this data in a few different formats. and I want to avoid multiple queries.
For example, right next to Expense.all I need to display expenses year to date. Rather than running two queries to pull essentially the same information, I thought I would try to pluck the YTD data from #expenses = Expense.all.
Right now I'm trying to use:
#expenses.select { |ex| ex.payment_date > Date.today.beginning_of_year }
but this is returning a blank array.
Is it possible to select results by date, and where am i messing up?
To include Jan 1 of this year in your YTD expenses, use >= instead of > in your select block.
Since you tagged this with Rails, an even more performant way to query this is by using ActiveRecord/SQL.
If you have many records, doing #expenses = Expense.all and then using the Ruby enumerable select on that collection will load all of the expenses from the DB into memory. This could be quite slow, or could even cause out-of-memory errors!
You can do (assuming the DB is Postgres):
#ytd_expenses = Expense.where("payment_date >= ?", Date.today.beginning_of_year)
This will only return the results you care about from the DB.
I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.
So far I have a query with a result set (in a temp table) with several columns but I am only concerned with four. One is a customer ID(varchar), one is Date (smalldatetime), one is Amount(money) and the last is Type(char). I have multiple rows with the same custmer ID and want to evaluate them based on Date, Amount and Type. For example:
Customer ID Date Amount Type
A 1-1-10 200 blue
A 1-1-10 400 green
A 1-2-10 400 green
B 1-11-10 100 blue
B 1-11-10 100 red
For all occurrences of A I want to compare them to identify only one, first by earliest date, then by greatest Amount, then if still tied by comparing Types. I would then return one row for each customer.
I would provide some of the query but I am at home now after spending two days trying to get a correct result. It looks something like this:
(query to populate #tempTable)
GROUP BY customer_id
HAVING date_cd =
(SELECT MIN(date_cd)
FROM order_table ot
WHERE ot.customerID = #tempTable.customerID
)
OR date_cd IS NULL
I assume the HAVING would result in only one row per customer_id. This did not end up being the case since there were some ties there.
I am not sure I can do the OR - there are some with NULL values here - and it did not account for the step to the next comparison if they were all the same anyway. I am not seeing a way to avoid doing some row processing of the temp table with some kind of IF or WHERE loop.
As I write I am thinking maybe I use #tempTable.date_cd in the HAVING clause instead of looking at the original table. but that should return the same dates?
Am I on the right track or is there something missing? Suggestions? More info??
try below query :-
select * from #tempTable
GROUP BY customer_id
HAVING isnull(date_cd,"1900/01/01") =min(isnull(date_cd,"1900/01/01"))
I have a rails 4 (ruby 2) app that tracks time for employees against various companies. I need to get a sum of the minutes per company per date. My problem is I'm not sure the best way to pad date/company pairs with 0 if there are no time entries for that company on that day.
Tables
Companies Time_Entries
id name ... id, created_at, company_id, minutes ...
Current output given only 2 companies and 2 days,
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500}]
Expected output to do is pad days that aren't recorded with 0's is to have an additional item in the list where the last item is the new item.
[{"company_id":1,"company_name":"Company A","date":"2013-06-24","minutes":987},
{"company_id":1,"company_name":"Company A","date":"2013-06-25","minutes":5},
{"company_id":2,"company_name":"Company B","date":"2013-06-24","minutes":500},
{"company_id":2,"company_name":"Company B","date":"2013-06-25","minutes":0}]
Current Query (PostgreSQL)
#minutes = TimeEntry.where("created_at >= ?", 1.week.ago.utc)
.group('companies.id, date(created_at)')
.joins(:company)
.select("companies.id as company_id", "companies.name as company_name", "date(created_at)", "SUM(minutes) as minutes")
.order("date ASC")
I'm not sure the best way to go about this. I can think of a couple options:
A 3 deep loop that loops through days, than a loop through companies, than a loop through found results to add any day/company pairs that have not already been added.
Do a left join on a generate_series() for a date range in postgresq and coalesce null sums to 0, but I don't think that will get me all the way
Some unknown better more elegant option