InfluxDB average for day of the week - influxdb

Currently my data in Influx is the following
measurement: revenue_count
field: users
field: revenue #
timestamp: (auto generated by influx)
What i'm looking to do is find a way to get the average revenue for a day in the week. i.e What is the average revenue for Monday, Tuesday etc.
What's the best way to do this in influx?

You should use continuous queries to schedule automated rollups/downsampling and then select the data from these pre-calculated series.
If you don't have too many points, you might not need the CQ's. In that case an on-the-fly group by will most probably be enough.
I wasn't able to find info on whether you can "select all points for a certain day" by just specifying a date. As far as I know, this is currently not possible because if you specify something like time == '2016-02-22 what this will effectively mean is 2016-02-22 00:00:00 (it won't mean give me everything from 22nd Feb 2016).
What you may need to do is specify an interval (two time points) between which you expect your downsampled point to be placed.

InfluxDB has no concept of days of the week. You can get the average revenue per day, where a day is midnight to midnight UTC with the following:
SELECT MEAN(revenue) FROM revenue_count WHERE time > now() - 7d GROUP BY time(1d)

Related

Predict a future date based on an average

I am trying to create a formula that will help me predict a future date based on an average time per day.
For example, I have a range of dates [1/12/2022, 5/12/2022, 15/12/2022], and each date has an amount of hours spent on that day [4, 2, 12]. At the moment I have a formula which will work out the average p/day by dividing the total by the start and current date.
What I want is to then predict the date based on this average hours (say 4 p/day) I will reach a goal of 2000.
An example sheet would look like this -
If below scenario is your input data then the following formula may help.
=C2+ROUNDUP(B2/A2,0)

InfluxDB integral gives high value after missing data

I am storing Amps, Volts and Watts in influxdb in a measurement/table called "Power". The frequency of update is approx every second. I can use the integral function to get power usage (in Amp Hour or Watt Hour) on an hourly basis. This is working very nicely, so I can get a graph of power used each hour over a 24 hour period. My SQL is below.
The issue is if there is a gap in the data then I get a huge spike in the result when it returns. eg if data was missing from 3pm to 5.45 pm, then the 5 pm result shows a huge spike. Reason I can see is there is close to 3 hours gap, so it just calculates the area under the graph and lumps it into the 5 PM value. Can I avoid that?
SELECT INTEGRAL(Watts) FROM Power WHERE time > now() - 24h GROUP BY time(1h)
I had a similar issue with Influx. It turns out that integral() doesn't support fill() as noted by Yuri Lachin in the comments.
Since you're grouping by hours anyway, then the average value of the power (watts) for the hour happens to be equal to the energy consumption for the hour (watt-hours), so you can use the mean() value here and you should get the correct result.
The query I'm using is:
SELECT mean("load_power") AS "load"
FROM "power_readings"
WHERE $timeFilter
GROUP BY time(1h) fill(0)
For daily numbers I can go back to using integral() because I rarely have gaps of data that span multiple days, so there's no filler needed.
Since you can use the fill() function in this query, you can decide what makes the most sense of the various options (see https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/#group-by-time-intervals-and-fill)
You need to use fill() in group by section of query (see docs for Influx fill() usage).
In your case fill(none) or fill(0) should do the job.

Rolling date monthly option in Adobe Analytics scheduled reports

I need to schedule reports on monthly basis. The reports need to go out on 1st of each month with data from previous month. For reporting, I have selected the preset date option as 'Last Month' with monthly rolling date option. Right now, it says: Rolling date options: 04/01/2017 (rolling monthly) - 04/30/2017 (rolling monthly) which is how it should be. But I am concerned if it considers the varying number of days in a month (30, 31). Can someone confirm when the next set of reports go out on 6/1, whether the date range would be from 05/01/2017 to 05/30/2017 or 05/31/2017?
If it doesn't consider the number of days in a month, is there an alternative to this setup for achieving the same results?
"Rolling monthly" does just as the name implies. When the date shifts from 4/30 to 5/1, the report will update to April data. When the date shifts from 5/31 to 6/1, the report will update to May data. If on 6/15 you again open the report, it will still be populated with May data, because the month has not yet rolled.

Timezone aware postgres query create a timeseries for minutes, hours, days

I am having a hard time to figure out how to deal with the following problem:
Our company is publishing posts to social media platforms. Those posts are stored within the database once they where successfully postet.
We want to provide a dashboard showing an overview of how many posts the user published over a time period grouped by minutes, hours and days.
I want to display the results as a time series graph.
This would work fine, but it gets very tricky once I have to support multiple time zones when I do aggregation/grouping by days. (apparently posts around midnight belong to different days depending on which time zone you are)
My current solution builds the postgres query using rails ActiveRecord. The problem I am facing is that I am struggling to deal with the timezone conversions...
Also I am not particular good at postgres...
The current implementation essentially looks like this (I removed irrelevant code):
Publication.select(
%{date_trunc('#{interval}',
published_at::timestamptz at time zone interval '#{time_zone_offset}')::timestamptz as time,
count(published_at)})
.where(%(published_at BETWEEN
timestamptz '#{start_date}' AND
timestamptz '#{end_date}'))
.group("1")
.order('time').limit(LIMIT)
For example:
I have one publication at 2016-03-15 10:19:24.219258 (Thats how it is stored inside the database therefore UTC time)
I create the following query:
SELECT date_trunc('hour',
published_at::timestamptz at time zone interval '+01:00')::timestamptz as time,
count(published_at) FROM "publications" WHERE (published_at BETWEEN
timestamptz '2016-03-15 10:00:00 +0100' AND
timestamptz '2016-03-15 12:00:00 +0100') GROUP BY 1
;
Which results in:
time | count
------------------------+-------
2016-03-15 10:00:00+01 | 1
(1 row)
Which should be:
time: "2016-03-15 10:00:00 UTC" or "2016-03-15 11:00:00+01" ( i don't care about the time zone representation but this is simply the wrong result)
Anybody knows what I am doing wrong here?
The main problem I got stuck is that I want to be able to group/aggregate publications per day, with respect to the time zone of the user requesting the query.
I don't care which time zone is returned as the front end can transform it to the user time zone.
Any feedback, help, or answer is highly appreciated.
Many thanks
Thanks to the discussion I had with devanand one solution is to split up the code and handle the daily interval with the query used in the question.
For the other intervals I use the following query:
Publication.select(
%{date_trunc('#{interval}',
published_at::timestamptz) as time,
count(published_at)})
.where(%(published_at BETWEEN
timestamptz '#{start_date}' AND
timestamptz '#{end_date}'))
.group('1')
.order('time').limit(LIMIT)
I am not happy with the solution though as it feels more like a workaround to me

Get difference of time and output to decimal format

I am trying to get the different between two times, lets say 2:00PM and 12:00AM. So I want to get how many hours are between those two times but have it be in decimal format which in this case would be 10.00 hours?. I am not sure how to go about this. The most I got to was just subtracting the two times and multiplying that decimal number by 24 which works if I do 2PM and 11PM which gives me 9.00hours, but as soon as I go to 2PM and 12AM it should show 10.00hours but shows -14.
Assuming your times are in ColumnA ("earlier") and ColumnB ("later") then:
=if(B1=0,(B1-A1+1)*24,(B1-A1)*24)
should work for you. The quotes are because (seems to depend upon how the times values are entered) Google may associate a date with the times even when that is not displayed. Google treats noon as 12:00PM which is wrong, it is noon not after noon (post meridiem) but one minute later 12:01PM, etc, does make sense. So 12:00AM is midnight and a special case where a date is associated because seen as midnight of the previous day - it counts as 0 not 24. Hence relative to 2pm today is 14 hours earlier (your result) whereas midnight tonight is 10 ahead (the result you expected).
The formula above checks whether the later time is midnight and compensates for that being treated as the day before by +1 in the formula.

Resources