InfluxdB Flux Query group by substring - influxdb

I search a way to group by substring/regex of a column:
I search in the documentation https://docs.influxdata.com/ without success.
The column "proc" is composed as follow:
CPU-010.CORE010-00
CPU-010.CORE010-01
CPU-010.CORE010-02
CPU-010.CORE010-03
CPU-010.CORE010-04
CPU-010.CORE010-05
...
CPU-020.CORE020-00
CPU-020.CORE020-01
CPU-020.CORE020-02
CPU-020.CORE020-03
CPU-020.CORE020-04
...
CPU-110.CORE110-00
CPU-110.CORE110-01
CPU-110.CORE110-02
CPU-110.CORE110-03
CPU-110.CORE110-04
...
CPU-120.CORE120-00
CPU-120.CORE120-01
CPU-120.CORE120-02
CPU-120.CORE120-03
etc..
The csv imported is composed as follow:
#datatype measurement,tag,tag,double,dateTime:number
Processors,srv,proc,usage,time
Processors,srv1,CPU-010.CORE010-00,52,1671231960
Processors,srv1,CPU-010.CORE010-00,50,1671232020
Processors,srv1,CPU-010.CORE010-00,49,1671232080
Processors,srv1,CPU-010.CORE010-00,50,1671232140
Processors,srv1,CPU-010.CORE010-00,48,1671232200
Processors,srv1,CPU-010.CORE010-00,53,1671232260
...
Processors,srv1,CPU-020.CORE020-00,52,1671231960
Processors,srv1,CPU-020.CORE020-00,50,1671232020
Processors,srv1,CPU-020.CORE020-00,49,1671232080
Processors,srv1,CPU-020.CORE020-00,50,1671232140
Processors,srv1,CPU-020.CORE020-00,48,1671232200
Processors,srv1,CPU-020.CORE020-00,53,1671232260
...
I tried with this query without success:
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Processors" and proc.p =~ /(CPU-[09].*)[.].*/)
|> group(columns: ["proc"])
|> aggregateWindow(every: v.windowPeriod, fn: mean)
I tried to group as follow:
CPU-010
CPU-020
CPU-110
CPU-120
Etc..
Many thank for any help

I found the correct syntax with your sample code:
import "strings"
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Processors")
|> filter(fn: (r) => r["_field"] == "usage")
|> map(fn: (r) => ({r with newProc: strings.substring(v: r.proc, start: 0, end: 7)}))
|> group(columns: ["newProc"])
|> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
|> yield(name: "mean")
Many thanks for your help.

Since you are trying to group by the prefix of the column names, you could try following:
create a new column based on the prefix
group by the new column
Sample code is as follows:
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "Processors")
|> map(fn: (r) => ({r with newProc: strings.substring(v: r._value, start: 0, end: 7)}))
|> group(columns: ["newProc"])
|> aggregateWindow(every: v.windowPeriod, fn: mean)
Notice: |> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean) ou can specify the column this way. Before trying this, try comment out this line to see the results before the aggregation, especially the column names.
See more details for map function and substring function and aggregateWindow function.

Related

Sum values in InfluxDB

I’m tryingto sum values in InfluxDB but I’m struggling a bit.
So, I have a _measurement "plug" with a field "value".
I have different records within the same bucket with a different ID tag.
I can get the evolution of 1 plug with this query:
from(bucket: "test-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "plug")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["id"] == "tag1")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
What I would like is the exact same graph with the sum of all r["id"].
So, if there is 34 for tag ID "tag1", 11.2 for "tag2" and 0 for "tag3", I would like a graph with 45.2 for that given time.
I’ve tried to use «group()» method, but I get a strange value, more like an average than a sum.
I’ve also tried to use «sum» method, but then, I feel like Influx is summing all the values across the whole timeline. That’s not what I want.
I just like to have a graph with with the sum of «value» field of all "tag" at a given time.
Thanks a lot for you help.
Right now you have a table per tag value. You can use pivot function to merge into a single table, where all the different _value columns are now named after their corresponding tag value:
from(bucket: "test-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "plug")
|> filter(fn: (r) => r._field == "value")
|> pivot(rowKey: ["_time"], columnKey: ["id"], valueColumn: "_value")
If you know the tag values in advance, the next step is easy:
|> map(fn: (r) => ({ _time: r._time, _value: r["tag1"] + r["tag2"] + r["tag3]}))
If you don't, it gets a bit more complicated. What I would try next in this case is to write a function that combines experimental.unpivot() (note: available since InfluxDB 2.4) with sum(). The trick here is to call this method within map(), so it will operate on a single row (ie, single timestamp) at a time:
sumColumns = (r) => r
|>experimental.unpivot()
|>group
|>sum()
|>findRecord(fn: (key) => true, idx: 0)
from(bucket: "test-bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "plug")
|> filter(fn: (r) => r._field == "value")
|> pivot(rowKey: ["_time"], columnKey: ["id"], valueColumn: "_value")
|> map(fn: sumColumns)
Note that I have not tested this. It is just to give you an idea.

influx query: how to get historical average

I am SQL native struggling with flux syntax (philosophy?) once again. Here is what I am trying to do: plot values of a certain measurement as a ratio of their historical average (say over the past month).
Here is as far as I have gotten:
from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
I believe this produces an average over the past 30 days, for each day window. Now what I don't know how to do is divide the original data by these values in order to get the relative change, i.e. something like value(_time)/tma_value(_time).
Thanks to #Munun, I got the following code working. I made a few changes since my original post to make things work as I needed.
import "date"
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: 1h, fn: sum)
|> map(fn: (r) => ({r with window_value: float(v: r._value)}))
t2 = from(bucket: "secret_bucket")
|> range(start: date.sub(from: v.timeRangeStop, d: 45d), stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> mean(column: "_value")
|> group()
|> map(fn: (r) => ({r with avg_value: r._value}))
join(tables: {t1: t1, t2: t2}, on: ["query"])
|> map(fn: (r) => ({r with _value: (r.window_value - r.avg_value)/ r.avg_value * 100.0 }))
|> keep(columns: ["_value", "_time", "query"])
Here are few steps you could try:
re-add _time after the aggregate function so that you can have same number of records as the original one:
|> duplicate(column: "_stop", as: "_time")
calculate the ratio with two data sources via join and map
The final Flux could be:
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
|> duplicate(column: "_stop", as: "_time")
t2 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
join(tables: {t1: t1, t2: t2}, on: ["hereIsTheTagName"])
|> map(fn: (r) => ({r with _value: r._value_t2 / r._value_t1 * 100.0}))

InfluxDB2.0 How to calculate daily value over bigger timeframe?

I'm migrating my InfluxDB1.8 version to InfluxDB2.0
I'm using a influxDB2.0 database and use grafana to display results.
What I insert as data are the results of my P1 meter, altough the results are total values, I would like to calculate and display the daily results.
What is being inserting is the current (gas usage) value. By calculating the difference of the begin and end of the day, I have my daily usage result.
I did find out a way to do this for 1 day. With the Spread function. But I don't get it working for a longer timeframe then 1 day.
But now to display this on a daily usage on a longer timeframe. I didn't find the right option to get this working
Week results
Anyone an idea?
Query for 1 day:
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Gas-usage")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> spread(column: "_value")```
I did some checks on the 1.8 one and what works there is:
SELECT spread("value")
FROM "Gas-usage"
WHERE $timeFilter
GROUP BY time(1d) fill(null) tz('Europe/Berlin')
what is the equivalant of this query in influxdb 2.0 ?
Try change your aggregate window, like this:
|> aggregateWindow(every: 1d, fn: mean)
use the spread function inside your aggreagateWindow function.
should be like this:
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Gas-usage")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
from(bucket: "${bucket}")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "system")
|> filter(fn: (r) => r.host == "${host}")
|> filter(fn: (r) => r["_field"] == "uptime")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
result of my grafana

How to get a query variable on InfluxDB 2.0 dashboard?

I read the documentation https://docs.influxdata.com/influxdb/v2.0/visualize-data/variables/
I thought great that will be a piece of cake.
I take a look at an existing query variable named bucket:
buckets()
|> filter(fn: (r) => r.name !~ /^_/)
|> rename(columns: {name: "_value"})
|> keep(columns: ["_value"])
It returns this data:
#group,false,false,false
#datatype,string,long,string
#default,_result,,
,result,table,_value
,,0,pool
,,0,test
The bucket variable works and I can refer to it as v.bucket in the cell queries of any dashboard.
Building on this example I craft the following query:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "minerstat")
|> keep(columns: ["account"])
|> distinct(column: "account")
|> keep(columns: ["_value"])
That returns this data:
#group,false,false,false
#datatype,string,long,string
#default,_result,,
,result,table,_value
,,0,0x04ff4e0c05c0feacccf93251c52a78639e0abef4
,,0,0x201f1a58f31801dcd09dc75616fa40e07a70467f
,,0,0x80475710b08ef41f5361e07ad5a815eb3b11ed7b
,,0,0xa68a71f0529a864319082c2475cb4e495a5580fd
And I save it as a query variable with the name account.
Then I use it in a dashboard cell query like this:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "minerstat")
|> filter(fn: (r) => r["account"] == v.account)
|> filter(fn: (r) => r["_field"] == "currentHashrate" or r["_field"] == "hashrate")
|> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
|> yield(name: "last")
But this returns no data. And the dropdown menu for the account variable on the dashboard view is empty.
If I replace v.account above with one of the value returned by the query behind the variable:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "minerstat")
|> filter(fn: (r) => r["account"] == "0x04ff4e0c05c0feacccf93251c52a78639e0abef4")
|> filter(fn: (r) => r["_field"] == "currentHashrate" or r["_field"] == "hashrate")
|> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
|> yield(name: "last")
That works as intended and display a nice graph.
What am I missing here?
SOLUTION: you cannot use variables inside the definition of a variable.
I replaced
start: v.timeRangeStart, stop: v.timeRangeStop
with
start: -1d
in the variable definition:
from(bucket: "pool")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "minerstat")
|> keep(columns: ["account"])
|> distinct(column: "account")
|> keep(columns: ["_value"])
I don't think you can use variables within variables, so things like v.timeRangeStart that you can use in a dashboard query can't be used to define another dashboard variable.
You can use duration literals though, like -5d or -2h in your range() call though.

InfluxDB Flux - Getting last and first values as a column

I am trying to create two new columns with the first and last values using the last() and first() functions. However the function isn’t working when I try to map the new columns. Here is the sample code below. Is this possible using Flux?
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "price_info")
|> filter(fn: (r) => r["_field"] == "price")
|> map(fn: (r) => ({r with
open: last(float(v: r._value)),
close: first(float(v: r._value)),
})
I am not answering directly to the question, however it might help.
I wanted to perform some calculation between first and last, here is my method, I have no idea if it is the right way to do.
The idea is to create 2 tables, one with only the first value and the other with only the last value, then to perform a union between both.
data = from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "plop")
l = data
|> last()
|> map(fn:(r) => ({ r with _time: time(v: "2011-01-01T01:01:01.0Z") }))
f = data
|> first()
|> map(fn:(r) => ({ r with _time: time(v: "2010-01-01T01:01:01.0Z") }))
union(tables: [f, l])
|> sort(columns: ["_time"])
|> difference()
For an unknown reason I have to set wrong date, just to be able to sort values and take into account than first is before last.
Just a quick thank you. I was struggeling with this as well. This is my code now:
First = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> first()
|> yield(name: "First")
Last = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> last()
|> yield(name: "Last")
union(tables: [First, Last])
|> difference()
Simple answer is to use join (You may also use old join, when using "new" join remember to import "join")
Example:
import "join"
balance_asset_gen = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
balance_asset_raw = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance_raw")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
// In my example I merge two data sources but you may just use 1 data source
balances_merged = union(tables: [balance_asset_gen, balance_asset_raw])
|> group(columns:["_time"], mode:"by")
|> sum()
f = balances_merged |> first()
l = balances_merged |> last()
// Watch out, here we assume we work on single TABLE (we don't have groups/one group)
join.left(
left: f,
right: l,
on: (l, r) => l.my_tag == r.my_tag, // pick on what to merge e.g. l._measurement == r._measurement
as: (l, r) => ({
_time: r._time,
_start: l._time,
_stop: r._time,
_value: (r._value / l._value), // we can calculate new field
first_value: l._value,
last_value: r._value,
}),
)
|> yield()

Resources