How to Group by last 20 days and do an aggregate function? - ruby-on-rails

I can't seem to figure this one out. I'm trying to get the standard deviation of a column for the past 20 days. Here is what I have
Model.where('date < ?','2013-03-25')
.group('date')
.order('date DESC')
.limit(20)
.select('stddev_samp(percent_change) as stdev')
However all I'm getting is 20 entries of Nil. I was expecting 1 entry of the standard deviation.
After switching the stddev_samp to sum, I see that I'm getting nil because you can't have a standard deviation on 1 entry. I.e. It is not grouping the 20 as I expected, but calculating standard deviation on each date.
So my question is, how do I get stddev of the last 20 days? I know it's possible to simply choose select percent_change and then calculate the standard deviation in ruby, but I assume that the aggregate function stddev_samp should be usable in this case.
I am using rails 3.2 and Postgresql 9.2

I'm not a Ruby guy so I'll explain it in normal SQL:
What you're doing is:
SELECT stddev_samp(percent_change) as stdev
FROM tbl
WHERE date < '2013-03-25'
GROUP BY date
ORDER BY date DESC
LIMIT 20;
This calculates the deviation for each day seperately, not for the sum of them, and when you try to get the deviation of only one element you get NULL.
Removing the GROUP BY would fix it but also would return the result for the whole table not just last 20 entries so we need a subquery:
SELECT stddev_samp(percent_change) as stdev
FROM
(SELECT percent_change
FROM tbl
WHERE date < '2013-03-25'
ORDER BY date DESC
LIMIT 20) AS q

No need to 'Group By', 'Order by' or sub-selects. Just get the records for the last 20 days and run the aggregate function on them.
Ruby:
Model.where('date >= ?', Date.today - 20.days).select('stddev_samp(percent_change) as stdev').first['stdev']
SQL:
select stddev_samp(percent_change) as stdev
from <table>
where date >= now() - interval 20 day;
If you want to use the LAST 20 RECORDS, not last 20 days:
Ruby:
Model.order('date desc').limit(20).select('stddev_samp(percent_change) as stdev').first['stdev']
SQL:
select stddev_samp(percent_change) as stdev
from <table>
order by date desc
limit 20;

you don't need the group by since you don't want one value for each date.
also your limit might not work if you have multiple values for a date or have a date missing
try this:
SELECT stddev_samp(percent_change) as stdev
FROM
(SELECT percent_change
FROM tbl
WHERE date > now() - interval '20 days') AS q

Related

How to run Rails queries over multiple date ranges (weeks)

I'm trying to iterate over each week in the calendar year and run a query.
range = Date.new(2020,3,16)..Date.new(2020,3,22)
u = User.where(created_at: range).count
But I'd like to do this for EACH week in another range (say since the beginning of this year).
Ruby's Date has a cweek function that gives you the week number but there doesn't seem to be a way to easily get from the week number to the date range.
Anyway, not sure how helpful cweek will be as I need week to run Sunday -> Saturday.
Thoughts?
I'm assuming this is Postgres and the model name is User based on your previous question.
If this blog is to to believed you can shift a date one day to get sun-sat day week.
User.group("(date_trunc('week', created_at::date + 1)::date - 1)")
.count
If you want the to select the actual week number while you are at it you can select raw data from the database instead of using ActiveRecord::Calculations#count which is pretty limited.
class User
# #return [ActiveRecord::Result]
# the raw query results with the columns count, year, week
def self.count_by_biblical_week
connection.select_all(
select(
"count(*) as count",
"date_part('year', created_at)::integer as year",
"(date_part('week', created_at::date + 1) - 1)::integer as week"
).group(:week, :year)
)
end
end
Usage:
results = User.where(created_at: Date.new(2020,3,16)..Date.new(2020,3,22))
.count_by_biblical_week
results.each do |row|
puts [row[:year], row[:week], row[:count]].join(' | ')
end
Adding the year to the group avoids ambiguity if the results span multiple years.

Query influxdb for a date

I have a table in influxdb that has a column called 'expirydate'. In the column I have afew dates e.g. "2016-07-14" or "2016-08-20". I want to select only the 2016-07-14 date, but I am unsure how?
My query is currently:
SELECT * FROM tablee where expirydate = '2016-07-14' limit 1000
But this does not work. Can someone please help me?
Assuming the value table**e** is a valid measurement...
If you are looking at selecting all of the points for the day '2016-07-14', then your query should look something like.
Query:
SELECT * FROM tablee where time >= '2016-07-14 00:00:00' and time < '2016-07-15 00:00:00'
You might also be interested in the influx's date time string in query.
See:
https://docs.influxdata.com/influxdb/v0.9/query_language/data_exploration/#relative-time
Date time strings Specify time with date time strings. Date time
strings can take two formats: YYYY-MM-DD HH:MM:SS.nnnnnnnnn and
YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ, where the second specification is
RFC3339. Nanoseconds (nnnnnnnnn) are optional in both formats.
Note:
The limit api could be redundant in your original query as it is there to impose restriction to the query from returning more than 1,000 point data.
I had to force influx to treat my 'string date' as a string. This works:
SELECT * FROM tablee where expirydate=~ /2016-07-14/ limit 1000;

InfluxDB average of distinct count over time

Using Influx DB v0.9, say I have this simple query:
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(1m)
Which gives results like:
08:00 5
08:01 10
08:02 5
08:03 10
08:04 5
Now I want a query that produces points with an average of those values over 5 minutes. So the points are now 5 minutes apart, instead of 1 minute, but are an average of the 1 minute values. So the above 5 points would be 1 point with a value of the result of (5+10+5+10+5)/5.
This does not produce the results I am after, for clarity, since this is just a count, and I'm after the average.
select count(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
This doesn't work (gives errors):
select mean(distinct("id")) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
Also doesn't work (gives error):
select mean(count(distinct("id"))) FROM "main" WHERE time > now() - 30m and time < now() GROUP BY time(5m)
In my actual usage "id" is a string (content, not a tag, because count distinct not supported for tags in my version of InfluxDB).
To clarify a few points for readers, in InfluxQL, functions like COUNT() and DISTINCT() can only accept fields, not tags. In addition, while COUNT() supports the nesting of the DISTINCT() function, most nested or sub-functions are not yet supported. In addition, nested queries, subqueries, or stored procedures are not supported.
However, there is a way to address your need using continuous queries, which are a way to automate the processing of data and writing those results back to the database.
First take your original query and make it a continuous query (CQ).
CREATE CONTINUOUS QUERY count_foo ON my_database_name BEGIN
SELECT COUNT(DISTINCT("id")) AS "1m_count" INTO main_1m_count FROM "main" GROUP BY time(1m)
END
There are other options for the CQ, but that basic one will wake up every minute, calculate the COUNT(DISTINCT("id")) for the prior minute, and then store that result in a new measurement, main_1m_count.
Now, you can easily calculate your 5 minute mean COUNT from the pre-calculated 1 minute COUNT results in main_1m_count:
SELECT MEAN("1m_count") FROM main_1m_count WHERE time > now() - 30m GROUP BY time(5m)
(Note that by default, InfluxDB uses epoch 0 and now() as the lower and upper time range boundaries, so it is redundant to include and time < now() in the WHERE clause.)

Earliest date after current date without associated record

I have a Rails model DailyAssignment with a date column, and would like to find the first date after today which does not have a DailyAssignment associated with it.
For instance, if I have an instance today, no instance tomorrow, and an instance the day after tomorrow, this method should return tomorrow.
If I were to do this in Ruby, it would be something like:
(Date.today..1.year.since.to_date).find do |date|
DailyAssignment.where(date: date).empty?
end
This is medium okay since it will terminate the iteration once it finds a record, but has two issues:
Iterating through a collection in Ruby is slow.
Barring some sort of while construct, I need to specify an 'end' date.
Is there a nice, efficient way to do this in PostgreSQL?
If you can, you should use a custom query to search through your database (these kind of searches are a lot faster within the DB).
If you search for a date within a time range, you can use the
generate_series(timestamp, timestamp, interval) function:
select s
from generate_series(?, ? + interval '1 year'), interval '1 day') s
left join daily_assignment on s = "date"
where "date" is null
limit 1
If you have no real upper bound, you can use a self-join to get the next free date:
select coalesce(
(select c."date" + interval '1 day'
from daily_assignment c
left join daily_assignment n on n."date" = c."date" + interval '1 day'
where c."date" > ? - interval '1 day'
and n."date" is null
order by c."date"
limit 1),
? + interval '1 day'
)
? marks mean the parameter of today (you may need casts, depending on your input); you could use now() instead, if you prefer.
P.S.: please, do not use date as a column name, it is a reserved word in SQL, and tells nothing about the column itself. Instead, you can use names like created_at, updated_at, happens_at, etc. or even at_date.
What I propose is to do 1 select query between dates, then loop your results and compare them with your selected results.
# select all dailyassignments
results = DailyAssignment.where("date >= from_date AND date <= to_date")
not_found_dates = []
(Date.today..1.year.since).find do |date|
found_assignment = results.detect {|instance| instance.date == date }
not_found_dates << date if found_assignment.nil?
end
You can try it this way:
def first_date_without_assignment
assignments = DailyAssignment.select('date').where('date > ?', Date.today)
return Date.tomorrow if assignments.empty?
assignment_dates = assignments.map(&:date)
date_range = (Date.tomorrow..(assignment_dates.last.advance(days: 1)).to_a
(date_range - assignment_dates).first
end
I didn't test it so I could mistype something, but it could work. I also find this, it should work on postgres http://www.postgresql.org/message-id/4F96EC90.6070600#encs.concordia.ca but it could be quite hard to write in rails or at least bad looking.

Select rows from the last 12 months starting on the last date inserted on the DB

I have a system about cars and parking tickets. I had a requirement to implement where I had to get all the tickets from the last 12 months, so I opened this question.
The requirement has changed and now I need to get the tickets from the last 12 months starting on the last ticket's date.
I know how to do that using SQL (postgres), it would be something like this example:
select *
from parking_tickets
where car_id = 25
AND
date > (select date from parking_tickets where car_id = 25 order by date desc limit 1) - INTERVAL '12 months'
order by date desc
But I would rather have it in ActiveRecord. Is there any way?
I could insert the subquery itself inside the where clause, but it would not be as nice as I would like to.
Is there a nice way to make this, something like this?
#cars = Car.includes(:parkingTickets)
.where('parkingTickets.date >= ?', MAX(parkingTickets.date) - 12.months)
.order('ID, parkingTickets.date desc')
I would like to have it done in a list of cars, so making the query before and then inserting this value in the query would not be an elegant solution, since I would have an array.
This solution should work:
Car.includes(:parking_tickets).where(id: 25, parking_tickets: {date: (ParkingTicket.where(car_id: 25).order(date: :desc).first.date - 12.month)..ParkingTicket.where(car_id: 25).order(date: :desc).first.date}).first.parking_tickets.order(date: :asc).all

Resources