influxdb : Display over a 24 hour period whether a machine is off, idle or on - influxdb

I’m new to influx and grafana and want to know in terms of percentage over a 24 hour period on whether the machine is off, idle or on. This is a IOT project and we're recording the state (off, idle on) by the power used. The data ends up being stored in influx under state as either 0 = off, 1 = idle and 2 = on. How can I achieve this in influx query or Grafana? Any pointers or help is appreciated.

I would use the pie chart plugin to show the three states and respective percentages over time. You can do something like:
SELECT mean("value") FROM "state" WHERE value=[0|1|2] GROUP BY time($__interval) fill(null)
For each part of the chart.

Have a look at grafana-discrete-panel (also available in official grafana plugin repository). It is designed specially to visualize state changes in time.

Related

Offsetting a graph by the first value of the shown time span

I have a graph of an energy meter in Grafana which shows the value of the consumed active energy over the selected time span.
This is a relatively new meter, a few months old, so the highest value it is currently showing is around 1570.3 kWh.
The interval shown in the image above is over the course of 24h, so it starts at 1568.1 kWh.
I want to offset the entire graph by 1568.1 kWh, so that the beginning of the graph is at 0 kWh and the end at 2200 Wh (~ 91 Wh per hour in average over 24 h).
It should always adjust when I change the selected time span, so that I can get a good overview of the daily, weekly or monthly consumption.
How do I archive this?
I read that using something like SELECT integral(derivative(max("in-value"))) ... would do the job, but I didn't get it to work. Also, I believe that just adding a SELECT max("in-value") - first_value_of_timespan("in-value") ... would be more precise and efficient, but such a method first_value_of_timespan does not exist.
The solution is to take the difference between the current interval and the next one (there are many small intervals in the shown time span), and then to do a cumulative_sum over all the differences of the time range.
In the specific case shown in the question the solution would be
SELECT cumulative_sum(difference(max("in-total"))) FROM "le.e6.haus.strom.zähler.hausstrom-solar" WHERE $timeFilter GROUP BY time($__interval) fill(previous)

influxdb group by from most recent time

Using "group by" to bin 5m intervals. The way it works now it will group from 15min to just under 20min and this will be give a 15 timestamp.
Is it possible to group from 20 to just above 15 and give it a 20 min timestamp.
thank you
Unfortunately, no: the time would always be the beginning of each time group.
I mean, you can group like that with ORDER BY time DESC - for some functions (like DIFFERENCE()) that does matter - but the timestamp still be the beginning.
That seems to be a feature of InfluxDB (although I haven't found mention of that in InfluxDB docs).
PS In Kapacitor, you can use the time of the selected point instead of the time group interval beginning - but only for selector functions, not aggregates

InfluxDB and Grafana graph using midnight as 0 on Y-axis derivative

I am graphing with Grafana (2.6.0) and I have an InfluxDB (0.10.2) database with the following data in it:
> select * from "WattmeterMainskwh" where time > now() - 5m
name: WattmeterMainskwh
-----------------------
time value
1457579891000000000 15529.322
1457579956000000000 15529.411
1457580011000000000 15529.425
1457580072000000000 15529.460
1457580135000000000 15529.476
...etc...
This data collects my household kilowatt usage as measured by a kWH gauge that steadily increments the usage value across months or years. I cannot easily reset the counter, nor do I wish to do so.
My goal is to create a graph that shows my daily kWH use over 24 hour periods starting at midnight, or at a minimum showing relative kWH over the interval displayed. This type of graph would be useful in many other circumstances as well where I could imagine "errors across the day" or "visitors since opening time" or "BGP resets per calendar week" were useful but the collection counter was not reset to zero upon the reset or turn-over of the time interval. This kind of counting is actually quite common in my experience.
This graph works, but doesn't show me what I'm looking for:
SELECT derivative(mean("value")) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
That graph just shows the difference between one sample and the previous sample. What I want is a steadily increasing line starting from the left side of the graph and increasing towards the right side of the graph, with zero as the bottom of the Y axis, and the graph starting at zero at the farthest left X value.
This graph works too and shows me the correct curve, but it's off by fifteen thousand or so. So far, it's the closest to what I want but since this is an ever-increasing counter that can't be reset I need to subtract some from the Y axis. Ideally, I'd like to subtract whatever the value was at the previous midnight from each sample to get a relative number based on a day instead of an absolute based on all time.
SELECT sum("value") FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
And here's the graph from that previous statement:
Graph that is off by 15k
This attempt didn't work - I apparently can't take a sum of a derivative group:
SELECT sum(derivative(mean("value"))) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
This doesn't work, either - I can't perform functions within "derivative":
SELECT derivative(sum("value")-first("value")) FROM "WattmeterMainskwh" WHERE $timeFilter GROUP BY time($interval) fill(null)
Of course, I could just create a new value that had calculations applied to it before I wrote it into InfluxDB, but that seems to me to be a data-redundant and sloppy way to solve this problem, as well as being quite inflexible if I want to look at other intervals on a whim. I'm hoping that there is some way to do this more elegantly within the combination of InfluxDB & Grafana, but I'm just not able to find it with the search terms I've used or the thinking I've put towards interpreting the documentation.
Is this type of graph even possible with InfluxDB/Grafana? As far as I can tell a continuous query is not a solution, and the lack of nested SELECTs makes even the hackish ways of doing this not obvious to me.
BONUS: It would be really great to have the graph show midnight every night as a "zero" location, instead of "zero" being the first point in the displayed interval, so looking at five days of normal data would show five distinct "waves" of increasing daily aggregate energy usage, with the wave Y value going back down to zero at 12:00:01 on each day. But I'll take whatever I can get.
Nested functions have only partial support. However, you can effectively nest functions by chaining Continuous Queries.
Use a CQ to calculate the derivative(mean(value)) and store that in a new measurement foo. Then for your graph you can query select sum(value) from foo.
(I know this answer is quite late, but it might help others. Oh, and please excuse me for all the Dutch in my graphs; I had to keep it in dutch for the highest possible WAF)
You could do what I do for my kWh calculations:
Which results in a simple query like this:
SELECT distinct("kwh_combined") FROM "smartmeter" WHERE $timeFilter GROUP BY time($__interval) fill(linear)
In order to get your total count.. or if you want it in a nice graph like this which shows the number of kWh's used per hour in the bars and the yellow line (I normally run in dark mode, excuse the yellow) which is my current WATT power draw:
This data (or at least your hourly usage in bars) can be retrieved by a query like this:
Which is this exact query (for B):
SELECT spread("kwh_combined") FROM "smartmeter" WHERE $timeFilter GROUP BY time(1h) fill(null)
... where the 'kwh_combined' is (still) my counter just counting up and up.
All this results in me being able to 'query' the InfluxDB for a certain time period, like "last 24 hours" to come up with a nice panel like this: (ignore the encircled prices, that was for a question I posted I just made 10 minutes ago, check my PS)
I hope this helps you or anyone else; it took me some figuring out, but I'm happy to give something back to the community :)
PS: Don't be as stupid as I was and hardcode your electrical and gas prices into your dashboard but store them with your measurements as they could change over time.
I had the same problem (same application even) and solved it here. In your case, the query should be roughly:
SELECT value-value_fill FROM
(SELECT first(value) as value_fill FROM WattmeterMainskwh WHERE time>now()-7d GROUP BY time(1d)),
(SELECT first(value) as value FROM WattmeterMainskwh WHERE time>now()-7d GROUP BY time(1h))
fill(previous)

InfluxDB: query to calculate average of StatsD "executionTime" values

I'm sending metrics in StatsD format to Telegraf, which forwards them to InfluxDB 0.9.
I'm measuring execution times (of some event) from multiple hosts. The measurement is called "execTime", and the tag is "host". Once Telegraf gets these numbers, it calculates mean/upper/lower/count, and stores them in separate measurements.
Sample data looks like this in influxdb:
TIME...FIELD..............HOST..........VALUE
t1.....execTime.count.....VM1...........3
t1.....execTime.mean......VM1...........15
t1.....execTime.count.....VM2...........6
t1.....execTime.mean......VM2...........22
(So at time t1, there were 3 events on VM1, with mean execution time 15ms, and on VM2 there were 6 events, and the mean execution time was 22ms)
Now I want to calculate the mean of the operation execution time across both hosts at time t1. Which is (3*15 + 6*22)/(3+6) ms.
But since the count and mean values are in two different series, I can't simply use "select mean(value) from execTime.mean"
Do I need to change my schema, or can I do this with the current setup?
What I need is essentially a new series, which is a combination of the execTime.count and execTime.mean across all hosts. Instead of calculating this on-the-fly, the best approach seems to be to actually create the series along with the others.
So now I have two timer stats being generated on each host for each event:
1. one event with actual hostname for the 'host' tag
2. second event with one tag "host=all"
I can use the first set of series to check mean execution times per host. And the second series gives me the mean time for all hosts combined.
It is possible to do mathematical operations on fields from two different series, provided both series are members of the same measurement. I suspect your schema is non-optimized for your use case.

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.
Metrics are written into InfluxDB in the following way. Here I only show one as an example
new Serie.Builder("perfmetric1")
.columns("time", "value", "id", "type")
.values(getTime(), getPerf1(), getId(), getType())
.build()
Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.
select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h
For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.
I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?
As #valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.
In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:
select * from perfmetric1 into perfmetric1.[id];
Later you would do:
select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h
This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.
Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

Resources