InfluxDB and Grafana graph using midnight as 0 on Y-axis derivative - influxdb

I am graphing with Grafana (2.6.0) and I have an InfluxDB (0.10.2) database with the following data in it:
> select * from "WattmeterMainskwh" where time > now() - 5m
name: WattmeterMainskwh
-----------------------
time value
1457579891000000000 15529.322
1457579956000000000 15529.411
1457580011000000000 15529.425
1457580072000000000 15529.460
1457580135000000000 15529.476
...etc...
This data collects my household kilowatt usage as measured by a kWH gauge that steadily increments the usage value across months or years. I cannot easily reset the counter, nor do I wish to do so.
My goal is to create a graph that shows my daily kWH use over 24 hour periods starting at midnight, or at a minimum showing relative kWH over the interval displayed. This type of graph would be useful in many other circumstances as well where I could imagine "errors across the day" or "visitors since opening time" or "BGP resets per calendar week" were useful but the collection counter was not reset to zero upon the reset or turn-over of the time interval. This kind of counting is actually quite common in my experience.
This graph works, but doesn't show me what I'm looking for:
SELECT derivative(mean("value")) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
That graph just shows the difference between one sample and the previous sample. What I want is a steadily increasing line starting from the left side of the graph and increasing towards the right side of the graph, with zero as the bottom of the Y axis, and the graph starting at zero at the farthest left X value.
This graph works too and shows me the correct curve, but it's off by fifteen thousand or so. So far, it's the closest to what I want but since this is an ever-increasing counter that can't be reset I need to subtract some from the Y axis. Ideally, I'd like to subtract whatever the value was at the previous midnight from each sample to get a relative number based on a day instead of an absolute based on all time.
SELECT sum("value") FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
And here's the graph from that previous statement:
Graph that is off by 15k
This attempt didn't work - I apparently can't take a sum of a derivative group:
SELECT sum(derivative(mean("value"))) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
This doesn't work, either - I can't perform functions within "derivative":
SELECT derivative(sum("value")-first("value")) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
Of course, I could just create a new value that had calculations applied to it before I wrote it into InfluxDB, but that seems to me to be a data-redundant and sloppy way to solve this problem, as well as being quite inflexible if I want to look at other intervals on a whim. I'm hoping that there is some way to do this more elegantly within the combination of InfluxDB & Grafana, but I'm just not able to find it with the search terms I've used or the thinking I've put towards interpreting the documentation.
Is this type of graph even possible with InfluxDB/Grafana? As far as I can tell a continuous query is not a solution, and the lack of nested SELECTs makes even the hackish ways of doing this not obvious to me.
BONUS: It would be really great to have the graph show midnight every night as a "zero" location, instead of "zero" being the first point in the displayed interval, so looking at five days of normal data would show five distinct "waves" of increasing daily aggregate energy usage, with the wave Y value going back down to zero at 12:00:01 on each day. But I'll take whatever I can get.

Nested functions have only partial support. However, you can effectively nest functions by chaining Continuous Queries.
Use a CQ to calculate the derivative(mean(value)) and store that in a new measurement foo. Then for your graph you can query select sum(value) from foo.

(I know this answer is quite late, but it might help others. Oh, and please excuse me for all the Dutch in my graphs; I had to keep it in dutch for the highest possible WAF)
You could do what I do for my kWh calculations:
Which results in a simple query like this:
SELECT distinct("kwh_combined") FROM "smartmeter" WHERE $timeFilter GROUP BY time($__interval) fill(linear)
In order to get your total count.. or if you want it in a nice graph like this which shows the number of kWh's used per hour in the bars and the yellow line (I normally run in dark mode, excuse the yellow) which is my current WATT power draw:
This data (or at least your hourly usage in bars) can be retrieved by a query like this:
Which is this exact query (for B):
SELECT spread("kwh_combined") FROM "smartmeter" WHERE $timeFilter GROUP BY time(1h) fill(null)
... where the 'kwh_combined' is (still) my counter just counting up and up.
All this results in me being able to 'query' the InfluxDB for a certain time period, like "last 24 hours" to come up with a nice panel like this: (ignore the encircled prices, that was for a question I posted I just made 10 minutes ago, check my PS)
I hope this helps you or anyone else; it took me some figuring out, but I'm happy to give something back to the community :)
PS: Don't be as stupid as I was and hardcode your electrical and gas prices into your dashboard but store them with your measurements as they could change over time.

I had the same problem (same application even) and solved it here. In your case, the query should be roughly:
SELECT value-value_fill FROM
(SELECT first(value) as value_fill FROM WattmeterMainskwh WHERE time>now()-7d GROUP BY time(1d)),
(SELECT first(value) as value FROM WattmeterMainskwh WHERE time>now()-7d GROUP BY time(1h))
fill(previous)

Related

Influxdb speed up query over long time periods with group by

i write sensor data every second to an influxdb database. Displaying weekly, monthly or yearly summaries in grafana is quite slow since it needs to query many thousand values.
To speed things up, i was thinking about using a cron job to run a queries like
select mean(sensor1) into data_avg_1h from data where time > start and time <= end group by time(1h)
select mean(sensor1) into data_avg_1d from data where time > start and time <= end group by time(1d)
select mean(sensor1) into data_avg_1w from data where time > start and time <= end group by time(1w)
This would mean i need more storage, but queries run much faster.
Is this a bodge job or acceptable and is there a more clever way to do something like that?
Yes. It is perfectly ok and it is also recommended to downsample the data like you have mentioned in the question.
However, instead of using a cronjob it will be better to use Continuous query feature of InfluxDB to achieve the same result.
Downsampling & Contious Query Documentation.
Please be aware that when storing the average value for short period, if you want to calculate the average for a longer period from this downsampled data you will have to calculate the weighted average. Otherwise, you will calculating the average of average which, may not be equal to the average value calculated from the Original data.
This is because, each downsampled average value might be having different number of datapoints.
So while calculating the mean on regular interval store the number of data points received in that interval. This way you will be able to calculate the weighted average.

How do I calculate average tons per hour by driver in Google Sheets?

https://docs.google.com/spreadsheets/d/1SiUfqrJNHPAYjibeNBdzWQEcuzka5srf7mSHAv_bn5k/edit?usp=sharing
What would a formula look like to calculate the average tons per hour by driver in this example spreadsheet? Correcting for long times or even days between loads.
We're being charged on an hourly basis for freight so I'd like to figure out which drivers are the most efficient.
It's been tricky because the only concrete source of information we have is the scale tickets. So if they only do a single load in a day or go several hours between loads then the data would be skewed if you use a simple metric like time elapsed.
Also, I'll need the time elapsed between rows (not just the difference between Time In and Time Out) unless that time is > 1.5 hours. So something like:
=(TIMEVALUE(E3)-TIMEVALUE(D2))*24
...With some added logic to not include anything over 1.5 hours.
If a pivot table would be better than a lengthy formula, that's fine with me.
Here's an example for some added context: Driver Cody goes to Farm Nic to receive a load of hay, then comes back to the weigh station (Ticket, Time In, Gross are then determined), dumps the load, comes back to weigh again empty (Tare, Net, and Time Out are determined here), and goes back to Farm Nic until all the hay is harvested. Then it's on to Farm Zach and Farm Williams to repeat the process. There are several Drivers going at a time, which can be seen if the spreadsheet is sorted by Ticket. My goal is to figure out how many Tons each driver delivers per hour. The time elapsed would include the time between Tickets, because Time In and Time Out just show the time elapsed between coming in with a load of hay and leaving to go back to the field. To get a true measure of tons delivered per hour, you'd need to include the time between tickets, but also remove any instance where that time is greater than 1.5 hours. That will account for circumstances where the Driver isn't working and we aren't being billed, such as during equipment breakdowns.
I'm not much of a formulas guy so I hope this suffice your needs.
First I added a column to your sheet, to calculate how many amount of hours is taking for every single row, to do that I made use of the TIMEVALUE function:
=(TIMEVALUE(E2)-TIMEVALUE(D2))*24
Now you just need to get all the driver's hours and tons and make the quotient total_tons / total_hours. For that they may be some other functions that would do the job, myself I have used QUERY:
=QUERY(Sheet1!A:M, "select C, sum(I), sum(M), sum(I) / sum(M) group by C", 1)
I think pretty straightforward query, group all the data by C (Driver's name) and then sum the column I (tons) and the column M (hours).
With the following result:
The format may be a little off but you can change it as mush as you want. You can copy or play with the sheet
EDIT
After you change your requirements I made a change to my formula to calculate the hours worked:
=IF(
AND(
C3=C2,
A3=A2,
IFERROR(
(TIMEVALUE(E3)-TIMEVALUE(D2))*24) <= 1.5,
TRUE
),
(TIMEVALUE(E3)-TIMEVALUE(D2))*24,
(TIMEVALUE(E2)-TIMEVALUE(D2))*24
)
Let me explain here, before there was a much simpler formula but now having multiple rows that we need to check makes the formula more lenghty.
First with the IF and AND statement we check if the next row has:
The same day (A3=A2)
The same Driver (C3=C2)
Less than an hour and a half of difference (TIMEVALUE(E3)-TIMEVALUE(D2))*24) <= 1.5)
And also because the last row throws an error trying to TIMEVALUE an empty column I had to add the IFERROR
After that the TRUE condition (same day, same driver, under 1.5h hours difference) will calculate from the current Time in (D2) to the next Time in (D3):
(TIMEVALUE(D3)-TIMEVALUE(D2))*24
And in the FALSE statement we do the same we were doing before:
(TIMEVALUE(E2)-TIMEVALUE(D2))*24
The QUERY function stays the same. And the results have decreased drastically:
If you have any doubts you can go ahead and see the sheet

How to measure throughput with dynamic interval in Grafana

We are measuring throughput using Grafana and Influx. Of course, we would like to measure throughput in terms how many requests, approximately, happens every single second (rps).
The typical request is:
SELECT sum("count") / 10 FROM "http_requests" GROUP BY time(10s)
But we are loosing possibility to use astonishing dynamic $__interval that very useful when graph scope is large, like a day of week. When we are changing interval we should change divider into SELECT expression.
SELECT sum("count") / $__interval FROM "http_requests" GROUP BY time($__interval)
But this approach does not work, because of empty result returns.
How to create request using dynamic $__interval for throughput measuring?
The reason you get no results is that $__interval is not a number but a string such as 10s, 1m, etc. that is understood by influxdb as a time range. So it is not possible to use it the way you are trying.
However, what you want to calculate is the mean which is available as a function in InfluxQL. The way to get the behavior that you want is with something like this.
SELECT mean("count") FROM "http_requests" GROUP BY time($__interval)
EDIT: On a second thought that is not quite what you want.
You'd probably need to use derivative. I'll come back to you on that one later.
Edit2: Do you think this answers the question that you have Calculating request per second using InfluxDB on Grafana
Edit3: Third edit's a charm.
We use your starting query and wrap it in another one as such:
SELECT sum("rps") from (SELECT sum("count") / 10 as rps FROM "http_requests" GROUP BY time(10s)) GROUP BY time($__interval)

Offsetting a graph by the first value of the shown time span

I have a graph of an energy meter in Grafana which shows the value of the consumed active energy over the selected time span.
This is a relatively new meter, a few months old, so the highest value it is currently showing is around 1570.3 kWh.
The interval shown in the image above is over the course of 24h, so it starts at 1568.1 kWh.
I want to offset the entire graph by 1568.1 kWh, so that the beginning of the graph is at 0 kWh and the end at 2200 Wh (~ 91 Wh per hour in average over 24 h).
It should always adjust when I change the selected time span, so that I can get a good overview of the daily, weekly or monthly consumption.
How do I archive this?
I read that using something like SELECT integral(derivative(max("in-value"))) ... would do the job, but I didn't get it to work. Also, I believe that just adding a SELECT max("in-value") - first_value_of_timespan("in-value") ... would be more precise and efficient, but such a method first_value_of_timespan does not exist.
The solution is to take the difference between the current interval and the next one (there are many small intervals in the shown time span), and then to do a cumulative_sum over all the differences of the time range.
In the specific case shown in the question the solution would be
SELECT cumulative_sum(difference(max("in-total"))) FROM "le.e6.haus.strom.zähler.hausstrom-solar" WHERE $timeFilter GROUP BY time($__interval) fill(previous)

InfluxDB integral gives high value after missing data

I am storing Amps, Volts and Watts in influxdb in a measurement/table called "Power". The frequency of update is approx every second. I can use the integral function to get power usage (in Amp Hour or Watt Hour) on an hourly basis. This is working very nicely, so I can get a graph of power used each hour over a 24 hour period. My SQL is below.
The issue is if there is a gap in the data then I get a huge spike in the result when it returns. eg if data was missing from 3pm to 5.45 pm, then the 5 pm result shows a huge spike. Reason I can see is there is close to 3 hours gap, so it just calculates the area under the graph and lumps it into the 5 PM value. Can I avoid that?
SELECT INTEGRAL(Watts) FROM Power WHERE time > now() - 24h GROUP BY time(1h)
I had a similar issue with Influx. It turns out that integral() doesn't support fill() as noted by Yuri Lachin in the comments.
Since you're grouping by hours anyway, then the average value of the power (watts) for the hour happens to be equal to the energy consumption for the hour (watt-hours), so you can use the mean() value here and you should get the correct result.
The query I'm using is:
SELECT mean("load_power") AS "load"
FROM "power_readings"
WHERE $timeFilter
GROUP BY time(1h) fill(0)
For daily numbers I can go back to using integral() because I rarely have gaps of data that span multiple days, so there's no filler needed.
Since you can use the fill() function in this query, you can decide what makes the most sense of the various options (see https://docs.influxdata.com/influxdb/v1.7/query_language/data_exploration/#group-by-time-intervals-and-fill)
You need to use fill() in group by section of query (see docs for Influx fill() usage).
In your case fill(none) or fill(0) should do the job.

Resources