InfluxQL to Flux - influxdb

What is the equivalent of this query
select sum("count") from "measurement_name" where time<now() and time>now()-4d group by time(100s),"source"
in Flux? I have tried
from(bucket:"metrics/default_metrics")
|> range(start: -4d)
|> filter(fn: (r)=> r._measurement == "measurement_name")
|> group(columns: ["source"])
|> window(every: 100s)
|> drop(columns:["_start","_stop","_measurement","column_a","column_b"])
|> yield()
and
from(bucket:"metrics/default_metrics")
|> range(start: -4d)
|> filter(fn: (r)=> r._measurement == "measurement_name")
|> window(every: 100s)
|> group(columns: ["source"])
|> drop(columns:["_start","_stop","_measurement","column_a","column_b"])
|> yield()
but they all seem to yield different results
This is grouping by time_interval = 100s and source. Supposedly, the grouping by time (and the sum aggregation implicitly?) is done using the window function from Flux but the result from the InfluxQL query (select...) are:
name: measurement_name
tags: source=source_name
time sum
---- ---
1601022500000000000 39
1601022600000000000 191
1601022700000000000 232
1601022800000000000 145
1601022900000000000 207
1601023000000000000 277
1601023100000000000 160
1601023200000000000 228
1601023300000000000 253
1601023400000000000 167
while the one coming from the Flux queries is
Table: keys: [source]
source:string _time:time _value:int _field:string
---------------------- ------------------------------ -------------------------- --------
source_name 2020-09-25T11:46:51.390000000Z 6 count
source_name 2020-09-25T11:46:54.124000000Z 5 count
source_name 2020-09-25T11:46:57.616000000Z 6 count
source_name 2020-09-25T11:46:57.999000000Z 9 count
source_name 2020-09-25T11:46:58.064000000Z 3 count
source_name 2020-09-25T11:46:58.307000000Z 6 count
source_name 2020-09-25T11:47:01.011000000Z 8 count
source_name 2020-09-25T11:47:03.634000000Z 6 count
source_name 2020-09-25T11:47:03.700000000Z 8 count
source_name 2020-09-25T11:47:04.144000000Z 8 count
The end goal is to plot this out in Grafana.
Is there also maybe a way to convert back and forth between these two paradigms? Whenever it's possible ofcourse

You need to include the sum() function explicitly. I suggest using aggregateWindow() too.
from(bucket:"metrics/default_metrics")
|> range(start: -4d)
|> filter(fn: (r)=> r._measurement == "measurement_name")
|> group(columns: ["source"])
|> aggregateWindow(every: 100s, fn: sum)
|> drop(columns:["_start","_stop","_measurement","column_a","column_b"])
|> yield()
Generally, group does not preserve sort order but aggregateWindow sorts by time before doing its work. That's something to look for. Additionally, the time bounds might not line up exactly between the Flux and InfluxQL queries. I expect them to, but double check it.

Related

InfluxDB: apply function on a 5 minute slice of data to generate the output stream

I'm a beginner with influxDB. I try to solve the following problem:
I have 2 streams of data from 2 temperature sensors. I need to get a stream of correlations, for each 5minute slice, so an output stream with 36 values in this case (last 3 hours) that I can plot on a graph.
My script (that I tried in the script editor) is:
t1 = from(bucket: "sensor1")
|> range(start: -3h)
|> filter(fn: (r) => r["_measurement"] == "temp1" and r["_field"] == "avg")
t2 = from(bucket: "sensor2")
|> range(start: -3h)
|> filter(fn: (r) => r["_measurement"] == "temp2" and r["_field"] == "avg")
t3 = cov(x: t1, y: t2, on: ["_time"], pearsonr: true)
|> yield(name: "cov")
If I execute the above (in influxdb2.4 script editor) I get the Pearson correlation calculated (a single value).
I tried to figure out the syntax from using the query builder, then switching to script editor, to see the generated code, but I failed.

How to overwrite latest aggregated row in Influxdb?

I have an Influxdb task that aggregates yearly data. It is run every 1 minute as current data are still changing.
option task = {name: "Yearly Throughput", every: 1m}
from(bucket: "my_bucket")
|> range(start: -3y)
|> filter(fn: (r) => r._measurement == "throughput")
|> aggregateWindow(every: 1y, fn: sum)
|> fill(value: 0)
|> set(key: "_measurement", value: "yearly_throughput")
|> to(bucket: "my_bucket")
Because last row is changing every 1 minute a new series is written to yearly_throughput measurement (_time column of last row is different every time). Any idea how to overwrite latest year series?

What is the query to select multiple columns and group by one of the column in InfluxDb using Flux?

How to write similar query using Flux:
SELECT field_a,field_b from 'measurement' where field_a = 10 and group by field_b
I'm afraid that the InfluxQL above won't work as currently InfluxDB only supports tags and time interval in GROUP BY clause, not the fields. This could be inferred from the syntax of group by clause (for more information refer to InfluxDB documentation).
Nevertheless, if you are grouping by some tag as follows:
SELECT field_a,tag_b from 'measurement' where field_a = 10 and group by tag_b
This is the equivalent Flux query:
from(bucket: "thisIsYourBucketInInfluxDBV2")
// specify start:0 to query from all time. Equivalent to SELECT * from db1. Use just as cautiously.
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "measurement" and r._field == "field_a" and r._value = 10)
|> filter(fn: (r) => r._value = 10)
Here is a guide for you to migrate your InfluxQL to Flux.
You can query several fields using a regex. And you can group by fields if you pivot your result table using the schema.fieldsAsCols() function; that way, the result of the query has columns that have the names of the queried fields. See this query:
import "influxdata/influxdb/schema"
from(bucket: "yourBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "measurement")
|> filter(fn: (r) => r["_field"] =~ /^(field_a|field_b)$/)
|> aggregateWindow(every: v.windowPeriod, fn: first, createEmpty: false)
//|> group()
//|> sort(columns: ["_time"])
|> schema.fieldsAsCols()
|> filter(fn: (r) => r.field_a == 10)
|> group(columns: ["field_b"])
//|> max(column: "field_b")
|> yield()
Two remarks :
To make sure that you have only one table before you group by field_b, uncomment the lines |> group() and |> sort(columns: ["_time"]). The first ungroups the result which is otherwise splitted into different values of your tags (if you have any). The latter sortes the ungrouped result by the timestamp.
Since there is no aggregation in your initial query, the flux query outputs several result tables depending on the number of different values of field_b. If you are for example interested in the the max of field_a for every group uncomment the line before |> yield().

How to create Influxdb alert for deviating from average hourly values?

So I'm trying to find any documentation on more complex Flux queries but after days of searching I'm still lost. I want to be able to calculate average values for each hour of the week and then when new data comes in I want to check if it deviates by x standard deviations for that hour.
Basically I want to have 24x7 array fields each representing the mean/median value for each hour of the week for the last 1 year. Then I want to compare last days values for each hour against these averages and report an error. I do not understand how to calculate these averages. Is there some hidden extensive documentation on Flux?
I don't really need a full solution, just some direction would be nice. Like, are there some utility functions for this in the standard lib or whatever
EDIT: After some reading, it really looks like all I need to do is use the window and aggregateWindow functions but I haven't yet found how exactly
Ok, so, this is what worked for me. Needs some cleaning up but gets the values successfully grouped per hour+weekday and the mean of all the values
import "date"
tab1 = from(bucket: "qweqwe")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "asdasd")
|> filter(fn: (r) => r["_field"] == "reach")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
mapped = tab1
|> map(fn: (r) => ({ r with wd: string(v: date.weekDay(t: r._time)), h: string(v: date.hour(t: r._time)) }))
|> map(fn: (r) => ({ r with mapped_time: r.wd + " " + r.h }))
grouped = mapped
|> group(columns: ["mapped_time"], mode: "by")
|> mean()
|> group()
|> toInt()
|> yield()

How do I "check" (alert on) an aggregate in InfluxDB 2.0 over a rolling window?

I want to raise an alarm when the count of a particular kind of event is less than 5 for the 3 hours leading up to the moment the check is evaluated, but I need to do this check every 15 minutes.
Since I need to check more frequently than the span of time I'm measuring, I can't do this based on my raw data (according to the docs, "[the schedule] interval matches the aggregate function interval for the check query". But I figured I could use a "task" to transform my data into a form that would work.
I was able to aggregate the data in the way that I hoped via a flux query, and I even saved the resultant rolling count to a dashboard.
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Results in the following scatterplot.
My hope was that I could just copy-paste this as a new task and get my nice new aggregated dataset. After resolving a couple of legible syntax errors, I settled on the following task definition:
option v = {timeRangeStart: -12h, timeRangeStop: now()}
option task = {name: "blech", every: 15m}
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Unfortunately, I'm stuck on an error that I can't find any mention of anywhere: could not execute task run; Err: no time column detected: no time column detected.
If you could help me debug this task run error, or sidestep it by accomplishing this task in some other manner, I'll be very grateful.
I know I'm late here, but the to function needs a _time column, but the count aggregate you are adding returns a _start and _stop column to indicate the time frame for the count, not a _time.
You can solve this by either adding |> duplicate(column: "_stop", as: "_time") just before your to function, or leveraging the aggregateWindow function which handles this for you.
|> aggregateWindow(every: 15m, fn: count)
References:
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/count
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/duplicate/
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/

Resources