How to create Influxdb alert for deviating from average hourly values? - influxdb

So I'm trying to find any documentation on more complex Flux queries but after days of searching I'm still lost. I want to be able to calculate average values for each hour of the week and then when new data comes in I want to check if it deviates by x standard deviations for that hour.
Basically I want to have 24x7 array fields each representing the mean/median value for each hour of the week for the last 1 year. Then I want to compare last days values for each hour against these averages and report an error. I do not understand how to calculate these averages. Is there some hidden extensive documentation on Flux?
I don't really need a full solution, just some direction would be nice. Like, are there some utility functions for this in the standard lib or whatever
EDIT: After some reading, it really looks like all I need to do is use the window and aggregateWindow functions but I haven't yet found how exactly

Ok, so, this is what worked for me. Needs some cleaning up but gets the values successfully grouped per hour+weekday and the mean of all the values
import "date"
tab1 = from(bucket: "qweqwe")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "asdasd")
|> filter(fn: (r) => r["_field"] == "reach")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
mapped = tab1
|> map(fn: (r) => ({ r with wd: string(v: date.weekDay(t: r._time)), h: string(v: date.hour(t: r._time)) }))
|> map(fn: (r) => ({ r with mapped_time: r.wd + " " + r.h }))
grouped = mapped
|> group(columns: ["mapped_time"], mode: "by")
|> mean()
|> group()
|> toInt()
|> yield()

Related

Take the median of a grouped set

I am quite new to Flux and want to solve an issue:
I got a bucket containing measurements, which are generated by a worker-service.
Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.
What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.
TLDR;
Get all measurementpoints that belong to the specific site-uuid
As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
Return a result that only contains the median value for each measurement
This does not work:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
This does not throw an error, but it of course does not take the median of the specific groups.
This is the result (simple table):
If I understand your question correctly you want a single number to be returned.
In that case you'll want to use the |> mean() function:
from(bucket: "test")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "lighthouse")
|> filter(fn: (r) => r["_field"] == "speedindex")
|> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
|> group(columns: ["measurement"])
|> mean()
|> yield(name: "mean")
The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).
I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.
On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.

InfluxDB: apply function on a 5 minute slice of data to generate the output stream

I'm a beginner with influxDB. I try to solve the following problem:
I have 2 streams of data from 2 temperature sensors. I need to get a stream of correlations, for each 5minute slice, so an output stream with 36 values in this case (last 3 hours) that I can plot on a graph.
My script (that I tried in the script editor) is:
t1 = from(bucket: "sensor1")
|> range(start: -3h)
|> filter(fn: (r) => r["_measurement"] == "temp1" and r["_field"] == "avg")
t2 = from(bucket: "sensor2")
|> range(start: -3h)
|> filter(fn: (r) => r["_measurement"] == "temp2" and r["_field"] == "avg")
t3 = cov(x: t1, y: t2, on: ["_time"], pearsonr: true)
|> yield(name: "cov")
If I execute the above (in influxdb2.4 script editor) I get the Pearson correlation calculated (a single value).
I tried to figure out the syntax from using the query builder, then switching to script editor, to see the generated code, but I failed.

Using InfluxDB with interpolate.linear does not output missing values

I have some monthly counter measurements stored inside an InfluxDB instance, e.g. data like this (in line protocol):
readings,location=xyz,medium=Electricity,meter=mainMeter energy=13660 1625322660000000000
readings,location=xyz,medium=Electricity,meter=mainMeter energy=13810 1627839610000000000
These are monthly readings, not sharp to the beginning of a month (one is at 3rd of July, the other on 1st of August).
My goal it to interpolate these readings on a daily basis, so I stumbled upon the not so well documented interpolate.linear function from Flux (https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/interpolate/linear/).
But the only output I can generate with my function returns me the two given data values from my input.
import "interpolate"
from(bucket: "ManualInput")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "readings")
|> filter(fn: (r) => r["_field"] == "energy")
|> interpolate.linear(every: 1d)
Am I missing something here? I've expected to have a linear interpolated value on each day... or is this not possible with Flux? (I'm using V 2.0.7)
I propose to add a yield() function.

Grouping by increasing stateDuration resets using Flux in InfluxDb

I am recording period between application heartbeats into Influxdb.
The "target" period is 2000ms.
If the period is above 2750ms, then it is defined as a "lag event".
My end objective is to run statistics on "how long" we are running without lag events.
I switched to Flux from Influxql, so that i could use the stateDuration() method.
Using the below method, i am able to collect the increasing durations. At lag events, the state_duration is then reset to -1.
from (bucket: "sampledb/autogen")
|> range(start: -1h)
|> filter(fn: (r) =>
r._measurement == "timers" and
r._field == "HeartbeatMs" and
r.character == "Tarek"
)
|> stateDuration(fn: (r) =>
r._value<=2750,
column: "state_duration",
unit: 1s
)
|> keep(columns: ["_time","state_duration"])
At this point, I would like to be able to collect 'max(state_duration)' for each duration between lag events, and this is where i get stuck. Trying to "group by every new stateDuration sequence"/"group by increasing stateDurations"...
I was thinking that it might be possible to use "reduce()" or "map()" to inject a sequence number that i can use to group by, somehow increasing that sequence number whenever i have a -1 in the state_duration.
Below is a graph of the "state_duration" when running the flux query, i am basically trying to capture the value at the top of each peak.
Any help is appreciated, including doing this e.g. in InfluxQL or with Continuous Queries.
Data looks like below when exported to csv:
"time","timers.HeartbeatMs","timers.character"
"2021-01-12T14:49:34.000+01:00","2717","Tarek"
"2021-01-12T14:49:36.000+01:00","1282","Tarek"
"2021-01-12T14:49:38.000+01:00","2015","Tarek"
"2021-01-12T14:49:40.000+01:00","1984","Tarek"
"2021-01-12T14:49:42.000+01:00","2140","Tarek"
"2021-01-12T14:49:44.000+01:00","1937","Tarek"
"2021-01-12T14:49:46.000+01:00","2405","Tarek"
"2021-01-12T14:49:48.000+01:00","2312","Tarek"
"2021-01-12T14:49:50.000+01:00","1453","Tarek"
"2021-01-12T14:49:52.000+01:00","1890","Tarek"
"2021-01-12T14:49:54.000+01:00","2077","Tarek"
"2021-01-12T14:49:56.000+01:00","2250","Tarek"
"2021-01-12T14:49:59.000+01:00","2360","Tarek"
"2021-01-12T14:50:00.000+01:00","1453","Tarek"
"2021-01-12T14:50:02.000+01:00","1952","Tarek"
"2021-01-12T14:50:04.000+01:00","2108","Tarek"
"2021-01-12T14:50:06.000+01:00","2485","Tarek"
"2021-01-12T14:50:08.000+01:00","1437","Tarek"
"2021-01-12T14:50:10.000+01:00","2421","Tarek"
"2021-01-12T14:50:12.000+01:00","1483","Tarek"
"2021-01-12T14:50:14.000+01:00","2344","Tarek"
"2021-01-12T14:50:17.000+01:00","2437","Tarek"
"2021-01-12T14:50:18.000+01:00","1092","Tarek"
"2021-01-12T14:50:20.000+01:00","1969","Tarek"
"2021-01-12T14:50:22.000+01:00","2359","Tarek"
"2021-01-12T14:50:24.000+01:00","2140","Tarek"
"2021-01-12T14:50:27.000+01:00","2421","Tarek"
There are two ways I can think of. One is to look for the inverted state. The other is to use elapsed() to find interval + timeShift() to emulate LAG().
I don't like the latter though I think the first is not intuitive neither :-(. Really hope features like LAG() or CurrentRecordIndex() would be available in Flux.
from (bucket: "sampledb/autogen")
|> range(start: -1h)
|> filter(fn: (r) =>
r._measurement == "timers" and
r._field == "HeartbeatMs" and
r.character == "Tarek"
)
|> stateDuration(fn: (r) =>
r._value>2750, // Look for the inverted state
column: "inverted_state_duration",
unit: 1s
)
|> keep(columns: ["_time","inverted_state_duration"])
// Clear out records of the periods you are after
|> filter(fn: (r) => r["inverted_state_duration"] == -1)
// Calculate the gap duration with elapsed()
|> elapsed(columnName: "state_duration")
|> filter(fn: (r) => r["state_duration"] > ${ max(stateDuration.unit, record interval) })

How do I "check" (alert on) an aggregate in InfluxDB 2.0 over a rolling window?

I want to raise an alarm when the count of a particular kind of event is less than 5 for the 3 hours leading up to the moment the check is evaluated, but I need to do this check every 15 minutes.
Since I need to check more frequently than the span of time I'm measuring, I can't do this based on my raw data (according to the docs, "[the schedule] interval matches the aggregate function interval for the check query". But I figured I could use a "task" to transform my data into a form that would work.
I was able to aggregate the data in the way that I hoped via a flux query, and I even saved the resultant rolling count to a dashboard.
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Results in the following scatterplot.
My hope was that I could just copy-paste this as a new task and get my nice new aggregated dataset. After resolving a couple of legible syntax errors, I settled on the following task definition:
option v = {timeRangeStart: -12h, timeRangeStop: now()}
option task = {name: "blech", every: 15m}
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Unfortunately, I'm stuck on an error that I can't find any mention of anywhere: could not execute task run; Err: no time column detected: no time column detected.
If you could help me debug this task run error, or sidestep it by accomplishing this task in some other manner, I'll be very grateful.
I know I'm late here, but the to function needs a _time column, but the count aggregate you are adding returns a _start and _stop column to indicate the time frame for the count, not a _time.
You can solve this by either adding |> duplicate(column: "_stop", as: "_time") just before your to function, or leveraging the aggregateWindow function which handles this for you.
|> aggregateWindow(every: 15m, fn: count)
References:
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/count
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/duplicate/
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/

Resources