influx query: how to get historical average - influxdb

I am SQL native struggling with flux syntax (philosophy?) once again. Here is what I am trying to do: plot values of a certain measurement as a ratio of their historical average (say over the past month).
Here is as far as I have gotten:
from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
I believe this produces an average over the past 30 days, for each day window. Now what I don't know how to do is divide the original data by these values in order to get the relative change, i.e. something like value(_time)/tma_value(_time).

Thanks to #Munun, I got the following code working. I made a few changes since my original post to make things work as I needed.
import "date"
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: 1h, fn: sum)
|> map(fn: (r) => ({r with window_value: float(v: r._value)}))
t2 = from(bucket: "secret_bucket")
|> range(start: date.sub(from: v.timeRangeStop, d: 45d), stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> mean(column: "_value")
|> group()
|> map(fn: (r) => ({r with avg_value: r._value}))
join(tables: {t1: t1, t2: t2}, on: ["query"])
|> map(fn: (r) => ({r with _value: (r.window_value - r.avg_value)/ r.avg_value * 100.0 }))
|> keep(columns: ["_value", "_time", "query"])

Here are few steps you could try:
re-add _time after the aggregate function so that you can have same number of records as the original one:
|> duplicate(column: "_stop", as: "_time")
calculate the ratio with two data sources via join and map
The final Flux could be:
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
|> duplicate(column: "_stop", as: "_time")
t2 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
join(tables: {t1: t1, t2: t2}, on: ["hereIsTheTagName"])
|> map(fn: (r) => ({r with _value: r._value_t2 / r._value_t1 * 100.0}))

Related

InfluxdB Flux Query group by substring

I search a way to group by substring/regex of a column:
I search in the documentation https://docs.influxdata.com/ without success.
The column "proc" is composed as follow:
CPU-010.CORE010-00
CPU-010.CORE010-01
CPU-010.CORE010-02
CPU-010.CORE010-03
CPU-010.CORE010-04
CPU-010.CORE010-05
...
CPU-020.CORE020-00
CPU-020.CORE020-01
CPU-020.CORE020-02
CPU-020.CORE020-03
CPU-020.CORE020-04
...
CPU-110.CORE110-00
CPU-110.CORE110-01
CPU-110.CORE110-02
CPU-110.CORE110-03
CPU-110.CORE110-04
...
CPU-120.CORE120-00
CPU-120.CORE120-01
CPU-120.CORE120-02
CPU-120.CORE120-03
etc..
The csv imported is composed as follow:
#datatype measurement,tag,tag,double,dateTime:number
Processors,srv,proc,usage,time
Processors,srv1,CPU-010.CORE010-00,52,1671231960
Processors,srv1,CPU-010.CORE010-00,50,1671232020
Processors,srv1,CPU-010.CORE010-00,49,1671232080
Processors,srv1,CPU-010.CORE010-00,50,1671232140
Processors,srv1,CPU-010.CORE010-00,48,1671232200
Processors,srv1,CPU-010.CORE010-00,53,1671232260
...
Processors,srv1,CPU-020.CORE020-00,52,1671231960
Processors,srv1,CPU-020.CORE020-00,50,1671232020
Processors,srv1,CPU-020.CORE020-00,49,1671232080
Processors,srv1,CPU-020.CORE020-00,50,1671232140
Processors,srv1,CPU-020.CORE020-00,48,1671232200
Processors,srv1,CPU-020.CORE020-00,53,1671232260
...
I tried with this query without success:
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Processors" and proc.p =~ /(CPU-[09].*)[.].*/)
|> group(columns: ["proc"])
|> aggregateWindow(every: v.windowPeriod, fn: mean)
I tried to group as follow:
CPU-010
CPU-020
CPU-110
CPU-120
Etc..
Many thank for any help
I found the correct syntax with your sample code:
import "strings"
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Processors")
|> filter(fn: (r) => r["_field"] == "usage")
|> map(fn: (r) => ({r with newProc: strings.substring(v: r.proc, start: 0, end: 7)}))
|> group(columns: ["newProc"])
|> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
|> yield(name: "mean")
Many thanks for your help.
Since you are trying to group by the prefix of the column names, you could try following:
create a new column based on the prefix
group by the new column
Sample code is as follows:
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Processors")
|> map(fn: (r) => ({r with newProc: strings.substring(v: r._value, start: 0, end: 7)}))
|> group(columns: ["newProc"])
|> aggregateWindow(every: v.windowPeriod, fn: mean)
Notice: |> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean) ou can specify the column this way. Before trying this, try comment out this line to see the results before the aggregation, especially the column names.
See more details for map function and substring function and aggregateWindow function.

influxDB - Get daily Max. value

I have data with hydrological measurements.
I want to get the daily max Water flow:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: v.windowPeriod, fn: max, createEmpty: false)
|> yield(name: "max")
For some reason, for some days, this returns multiple measurements per day.
But not always.
How do I get only the max entry per day?
You need to set the every parameter in the aggregateWindow method to 1d:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
|> yield(name: "max")
See the Flux documentation for more details.

InfluxDB Flux - only show values over a certain %

I am using InfluxDb with Grafana and I am only wanting it to show values over a certain %. Any ideas how I could do this.
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "vsphere_datastore_disk")
|> filter(fn: (r) => r["_field"] == "capacity_latest" or r["_field"] == "used_latest")
|> filter(fn: (r) => r["source"] =~ /.*/)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> map(fn: (r) => ({ r with _value: float(v: r.used_latest) / float(v: r.capacity_latest) * 100.0 }))
|> group(columns: ["source","_field"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean") ```

Influxdb Flux query with custom window aggregate function

Could you please help me with the InfluxDB 2 Flux query syntax to build a windowed query with a custom aggregate function.
I went through the online docs, but they seem to be lacking examples on how to get to the actual window content (first, last records) from within the custom aggregate function. It also doesn't immediately describe the expected signature of the custom functions.
I'd like to build a query with a sliding window that would produce a difference between the first and the last value in the window. Something along these lines:
difference = (column, tables=<-) => ({ tables.last() - tables.first() })
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1mo, fn: difference, column: "_value", timeSrc: "_stop", timeDst: "_time", createEmpty: true)
|> yield(name: "diff")
The syntax of the above example is obviously wrong, but hopefully you can understand, what I'm trying to do.
Thank you!
Came up with the following. It works at least syntactically:
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(
every: 1mo,
fn: (column, tables=<-) => tables |> reduce(
identity: {first: -1.0, last: -1.0, diff: -1.0},
fn: (r, acc) => ({
first:
if acc.first < 0.0 then r._value
else acc.first,
last:
r._value,
diff:
if acc.first < 0.0 then 0.0
else (acc.last - acc.first)
})
)
|> drop(columns: ["first", "last"])
|> set(key: "_field", value: column)
|> rename(columns: {diff: "_value"})
)
|> yield(name: "diff")
The window is not really sliding though.
The same for the sliding window:
from(bucket: "my-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "simple")
|> filter(fn: (r) => r["_field"] == "value")
|> window(every: 1h, period: 1mo)
|> reduce(
identity: {first: -1.0, last: -1.0, diff: -1.0},
fn: (r, acc) => ({
first:
if acc.first < 0.0 then r._value
else acc.first,
last:
r._value,
diff:
if acc.first < 0.0 then 0.0
else (acc.last - acc.first)
})
)
|> duplicate(column: "_stop", as: "_time")
|> drop(columns: ["first", "last"])
|> rename(columns: {diff: "_value"})
|> window(every: inf)

InfluxDB Flux - Getting last and first values as a column

I am trying to create two new columns with the first and last values using the last() and first() functions. However the function isn’t working when I try to map the new columns. Here is the sample code below. Is this possible using Flux?
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "price_info")
|> filter(fn: (r) => r["_field"] == "price")
|> map(fn: (r) => ({r with
open: last(float(v: r._value)),
close: first(float(v: r._value)),
})
I am not answering directly to the question, however it might help.
I wanted to perform some calculation between first and last, here is my method, I have no idea if it is the right way to do.
The idea is to create 2 tables, one with only the first value and the other with only the last value, then to perform a union between both.
data = from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "plop")
l = data
|> last()
|> map(fn:(r) => ({ r with _time: time(v: "2011-01-01T01:01:01.0Z") }))
f = data
|> first()
|> map(fn:(r) => ({ r with _time: time(v: "2010-01-01T01:01:01.0Z") }))
union(tables: [f, l])
|> sort(columns: ["_time"])
|> difference()
For an unknown reason I have to set wrong date, just to be able to sort values and take into account than first is before last.
Just a quick thank you. I was struggeling with this as well. This is my code now:
First = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> first()
|> yield(name: "First")
Last = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> last()
|> yield(name: "Last")
union(tables: [First, Last])
|> difference()
Simple answer is to use join (You may also use old join, when using "new" join remember to import "join")
Example:
import "join"
balance_asset_gen = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
balance_asset_raw = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance_raw")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
// In my example I merge two data sources but you may just use 1 data source
balances_merged = union(tables: [balance_asset_gen, balance_asset_raw])
|> group(columns:["_time"], mode:"by")
|> sum()
f = balances_merged |> first()
l = balances_merged |> last()
// Watch out, here we assume we work on single TABLE (we don't have groups/one group)
join.left(
left: f,
right: l,
on: (l, r) => l.my_tag == r.my_tag, // pick on what to merge e.g. l._measurement == r._measurement
as: (l, r) => ({
_time: r._time,
_start: l._time,
_stop: r._time,
_value: (r._value / l._value), // we can calculate new field
first_value: l._value,
last_value: r._value,
}),
)
|> yield()

Resources