Using InfluxDB with interpolate.linear does not output missing values - influxdb

I have some monthly counter measurements stored inside an InfluxDB instance, e.g. data like this (in line protocol):
readings,location=xyz,medium=Electricity,meter=mainMeter energy=13660 1625322660000000000
readings,location=xyz,medium=Electricity,meter=mainMeter energy=13810 1627839610000000000
These are monthly readings, not sharp to the beginning of a month (one is at 3rd of July, the other on 1st of August).
My goal it to interpolate these readings on a daily basis, so I stumbled upon the not so well documented interpolate.linear function from Flux (https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/interpolate/linear/).
But the only output I can generate with my function returns me the two given data values from my input.
import "interpolate"
from(bucket: "ManualInput")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "readings")
|> filter(fn: (r) => r["_field"] == "energy")
|> interpolate.linear(every: 1d)
Am I missing something here? I've expected to have a linear interpolated value on each day... or is this not possible with Flux? (I'm using V 2.0.7)

I propose to add a yield() function.

Related

Take the median of a grouped set

I am quite new to Flux and want to solve an issue:
I got a bucket containing measurements, which are generated by a worker-service.
Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.
What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.
TLDR;
Get all measurementpoints that belong to the specific site-uuid
As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
Return a result that only contains the median value for each measurement
This does not work:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
This does not throw an error, but it of course does not take the median of the specific groups.
This is the result (simple table):
If I understand your question correctly you want a single number to be returned.
In that case you'll want to use the |> mean() function:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> mean()
|> yield(name: "mean")
The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).
I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.
On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.

Find the correct grouping

I'll first try to describe the non-technical problem:
There are many services, each service can have multiple instances and each of those instances is scraped for data and that data is stored in influxdb. Now the point in time where that data is scraped from each instance of the service is (obviously) not exactly the same point in time.
What i like to query (or display) is the maximum value for each service, over all instances. And i did not find a way to "quantify" time points. For examle to always move the value to the next full minute or something so that timescales get comparable.
Not the technical problem is that the reported values are all totals so to get a sense of change in that values i need difference oder derivative but these functions are in my case often aplied to one value of instance 1 and one value of instance 2 which reflects the difference between the instances and not between two points in time.
heres what i tried so far but i gives me almost flatlines since the instances report pretty much the same value but at alternating points in time.
from(bucket: "dpl4")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "prometheus")
|> filter(fn: (r) => r._field == "http_server_requests_seconds_sum")
|> group(columns: ["service"], mode: "by")
|> aggregateWindow(every: 1m, fn: max)
|> derivative(unit: 1m, nonNegative: true)
I hope i was able to describe the problem.

How to filter out only positive values in InfluxDB?

I've got an InfluxDB database with measurements of Grid power usage. The Grid power is negative when our Solar PV is not enough to power up the house and we are importing from the grid. Likewise it's the measurement is positive when we've got surplus Solar PV power and we are exporting to the grid.
Now I would like to calculate (perhaps using integral()) the cost of power exported, separately from the cost of power imported. Because there are different rates I can't simply integrate the it all together, I need the above zero and below zero considered separately to calculate the energy in kWh and subsequently the cost in each direction.
I was hoping to use InfluxDB min() and max() but that seem to select the min/max value from a given interval, not quite what I need I think.
Can I somehow split this measurement into two for further calculations?
I'm on InfluxDB 1.8 but considering an upgrade to 2.x eventually.
Since you are going to upgrade to 2.x, you could try Flux.
In v1.8, you can turn on Flux following this doc.
Use the filter operator to filter the positive and negative values, then apply the integral function.
Power Surplus:
from(bucket: "yourDatabaseName/autogen")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "yourMeasurementName" and r._field == "yourFieldName")
|> filter(fn: (r) => r._value > 0)
|> integral(unit: 10s)
Power Deficit:
from(bucket: "yourDatabaseName/autogen")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "yourMeasurementName" and r._field == "yourFieldName")
|> filter(fn: (r) => r._value < 0)
|> integral(unit: 10s)

How does Influxdb sort and draw the line by default?

I have a simple query for fetching some data like:
from(bucket: "almon")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "binary")
|> filter(fn: (r) => r["_field"] == "duration_mili")
|> group()
|> yield(name: "mean")
and the graph I get is
What I don't understand is why are the data points sorted by _time, but the actual line seems to not follow that. After exploring the data it seems like the line is drawn in the order of sorted tags. Why is that so and is that documented somewhere? What influences the logic for drawing the line on the graph?
By default InfluxDB returns data grouped by measurement+tags and then within those groups it is sorted by _time.
Because you called group() you removed that default grouping, but that doesn't force a re-sorting, so you still have the data ordered by groups even though it's no longer separated by groups.
If you add |> sort(columns: ["_time"]) after your group() that should take care of things for you.

How to create Influxdb alert for deviating from average hourly values?

So I'm trying to find any documentation on more complex Flux queries but after days of searching I'm still lost. I want to be able to calculate average values for each hour of the week and then when new data comes in I want to check if it deviates by x standard deviations for that hour.
Basically I want to have 24x7 array fields each representing the mean/median value for each hour of the week for the last 1 year. Then I want to compare last days values for each hour against these averages and report an error. I do not understand how to calculate these averages. Is there some hidden extensive documentation on Flux?
I don't really need a full solution, just some direction would be nice. Like, are there some utility functions for this in the standard lib or whatever
EDIT: After some reading, it really looks like all I need to do is use the window and aggregateWindow functions but I haven't yet found how exactly
Ok, so, this is what worked for me. Needs some cleaning up but gets the values successfully grouped per hour+weekday and the mean of all the values
import "date"
tab1 = from(bucket: "qweqwe")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "asdasd")
|> filter(fn: (r) => r["_field"] == "reach")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
mapped = tab1
|> map(fn: (r) => ({ r with wd: string(v: date.weekDay(t: r._time)), h: string(v: date.hour(t: r._time)) }))
|> map(fn: (r) => ({ r with mapped_time: r.wd + " " + r.h }))
grouped = mapped
|> group(columns: ["mapped_time"], mode: "by")
|> mean()
|> group()
|> toInt()
|> yield()

Resources