InfluxQL to new task FLUX with resample and group - influxdb

I 6 hour try convert influxQL to FLUX query.
Any can do it ?
CREATE CONTINUOUS QUERY trade_to_candles_60 ON mydb
RESAMPLE EVERY 10s FOR 10m
BEGIN SELECT first(price) AS open, last(price) AS close, max(price) AS high, min(price) AS low, sum(amount) AS volume INTO
candles_1m FROM trades WHERE
GROUP BY time(1m), pair, exchange END

Following the sample provided by influxdata in their documentation (here), I would split this contiuous query into four flux tasks like this:
import "experimental"
options task = {
name: "trade_to_candles_60_open",
every: 10s
}
from(bucket: "mydb/")
|> range(start: experimental.subDuration(d: 10m, from: now()))
|> filter(fn: (r) =>
r._measurement == "trades" and
r._field == "price"
)
|> group(columns: ["pair","exchange"])
|> aggregateWindow(every: 1m, fn: first)
|> set(key: "_field", as: "open")
|> to(bucket: "mydb/candles_1m")
I personnally think that it's not serious that influxdata doesn't provide any CQ to flux task converter. I try to migrate from 1.8 to 2.0 but it's a nightmare : no way to test flux task in 1.8, no way to use CQ in 2.0 to ease migration. Every CQ has to be manually rewritten in flux task and I've hundreds of CQ...

Related

How to overwrite latest aggregated row in Influxdb?

I have an Influxdb task that aggregates yearly data. It is run every 1 minute as current data are still changing.
option task = {name: "Yearly Throughput", every: 1m}
from(bucket: "my_bucket")
|> range(start: -3y)
|> filter(fn: (r) => r._measurement == "throughput")
|> aggregateWindow(every: 1y, fn: sum)
|> fill(value: 0)
|> set(key: "_measurement", value: "yearly_throughput")
|> to(bucket: "my_bucket")
Because last row is changing every 1 minute a new series is written to yearly_throughput measurement (_time column of last row is different every time). Any idea how to overwrite latest year series?

flux query very slow in compare to InfluxQL (10x slower)

I'm upgrading form influx1.x to influx2.x (updating queries from influxQL to Flux syntax).
For very simple queries, performance drops dramatically when I try to query more than 500,000 points and I'm not sure if there's anything I can do to improve my queries to get better performance
InfluxQL:
select last("y") AS "y" from "mydata".autogen."profile"
WHERE time >= '2019-01-01T00:00:00Z' and time <= '2019-01-07T23:59:59Z'
GROUP BY time(1s) FILL(none)
Flux:
data=from(bucket: "mydata")
|> range(start: 2019-01-01T00:00:00Z, stop: 2019-01-07T23:59:59Z)
|> filter(fn: (r) => r._measurement == "profile")
|> filter(fn: (r) => r._field=="y")
|> aggregateWindow(every: 1s, fn: last, createEmpty: false)
|> yield()
any advice?
You could try rebuilding the time series index with the command below:
influxd inspect build-tsi
See more details here.
The reason behind this is while you are upgrading, the meta and data are migrated but not the indices. So "InfluxDB must build a new time series index (TSI). Depending on the volume of data present, this may take some time." according to the guide.

Grouping by increasing stateDuration resets using Flux in InfluxDb

I am recording period between application heartbeats into Influxdb.
The "target" period is 2000ms.
If the period is above 2750ms, then it is defined as a "lag event".
My end objective is to run statistics on "how long" we are running without lag events.
I switched to Flux from Influxql, so that i could use the stateDuration() method.
Using the below method, i am able to collect the increasing durations. At lag events, the state_duration is then reset to -1.
from (bucket: "sampledb/autogen")
|> range(start: -1h)
|> filter(fn: (r) =>
r._measurement == "timers" and
r._field == "HeartbeatMs" and
r.character == "Tarek"
)
|> stateDuration(fn: (r) =>
r._value<=2750,
column: "state_duration",
unit: 1s
)
|> keep(columns: ["_time","state_duration"])
At this point, I would like to be able to collect 'max(state_duration)' for each duration between lag events, and this is where i get stuck. Trying to "group by every new stateDuration sequence"/"group by increasing stateDurations"...
I was thinking that it might be possible to use "reduce()" or "map()" to inject a sequence number that i can use to group by, somehow increasing that sequence number whenever i have a -1 in the state_duration.
Below is a graph of the "state_duration" when running the flux query, i am basically trying to capture the value at the top of each peak.
Any help is appreciated, including doing this e.g. in InfluxQL or with Continuous Queries.
Data looks like below when exported to csv:
"time","timers.HeartbeatMs","timers.character"
"2021-01-12T14:49:34.000+01:00","2717","Tarek"
"2021-01-12T14:49:36.000+01:00","1282","Tarek"
"2021-01-12T14:49:38.000+01:00","2015","Tarek"
"2021-01-12T14:49:40.000+01:00","1984","Tarek"
"2021-01-12T14:49:42.000+01:00","2140","Tarek"
"2021-01-12T14:49:44.000+01:00","1937","Tarek"
"2021-01-12T14:49:46.000+01:00","2405","Tarek"
"2021-01-12T14:49:48.000+01:00","2312","Tarek"
"2021-01-12T14:49:50.000+01:00","1453","Tarek"
"2021-01-12T14:49:52.000+01:00","1890","Tarek"
"2021-01-12T14:49:54.000+01:00","2077","Tarek"
"2021-01-12T14:49:56.000+01:00","2250","Tarek"
"2021-01-12T14:49:59.000+01:00","2360","Tarek"
"2021-01-12T14:50:00.000+01:00","1453","Tarek"
"2021-01-12T14:50:02.000+01:00","1952","Tarek"
"2021-01-12T14:50:04.000+01:00","2108","Tarek"
"2021-01-12T14:50:06.000+01:00","2485","Tarek"
"2021-01-12T14:50:08.000+01:00","1437","Tarek"
"2021-01-12T14:50:10.000+01:00","2421","Tarek"
"2021-01-12T14:50:12.000+01:00","1483","Tarek"
"2021-01-12T14:50:14.000+01:00","2344","Tarek"
"2021-01-12T14:50:17.000+01:00","2437","Tarek"
"2021-01-12T14:50:18.000+01:00","1092","Tarek"
"2021-01-12T14:50:20.000+01:00","1969","Tarek"
"2021-01-12T14:50:22.000+01:00","2359","Tarek"
"2021-01-12T14:50:24.000+01:00","2140","Tarek"
"2021-01-12T14:50:27.000+01:00","2421","Tarek"
There are two ways I can think of. One is to look for the inverted state. The other is to use elapsed() to find interval + timeShift() to emulate LAG().
I don't like the latter though I think the first is not intuitive neither :-(. Really hope features like LAG() or CurrentRecordIndex() would be available in Flux.
from (bucket: "sampledb/autogen")
|> range(start: -1h)
|> filter(fn: (r) =>
r._measurement == "timers" and
r._field == "HeartbeatMs" and
r.character == "Tarek"
)
|> stateDuration(fn: (r) =>
r._value>2750, // Look for the inverted state
column: "inverted_state_duration",
unit: 1s
)
|> keep(columns: ["_time","inverted_state_duration"])
// Clear out records of the periods you are after
|> filter(fn: (r) => r["inverted_state_duration"] == -1)
// Calculate the gap duration with elapsed()
|> elapsed(columnName: "state_duration")
|> filter(fn: (r) => r["state_duration"] > ${ max(stateDuration.unit, record interval) })

Why after using InfluxDB v2.0 join() funciton, it can't write to a bucket?

Here is my flux script, when I run it, there is no error, but there is no data in bucket “output-test-3” , and exist data in bucket "output-test-4" :(
I have been troubled by this problem for a long time. Can anyone solve my problem?
option task = {name: "join-test-1", every: 5m, offset: 5s}
max_connections = from(bucket: "Node-exporter")
|> range(start: -task.every)
|> filter(fn: (r) =>
(r["_measurement"] == "go_info"))
|> last()
|> to(bucket: "output-test-4")
used_connections = from(bucket: "Node-exporter")
|> range(start: -task.every)
|> filter(fn: (r) =>
(r["_measurement"] == "go_goroutines"))
|> last()
|> to(bucket: "output-test-4")
a = join(tables: {max_connections: max_connections, used_connections: used_connections}, on:
["_time", "_start", "_measurement", "_stop", "_field"])
|> to(bucket: "output-test-3")
When using the join() function to connect two queries a and b, the variables _field, _measurement, and _value will automatically become _field_a, _filed_b, _value_a, _value_b, etc. When InfluxDB writes to the bucket, there must be three fragments of _field, _measurement, and _value, but due to the above reasons, these neighbors have disappeared. So the easiest way to solve this problem is to use the map() function to recreate these three subdivisions. The content inside can be specified. When using the data, it is good not to use the data of these specified partitions.

How do I "check" (alert on) an aggregate in InfluxDB 2.0 over a rolling window?

I want to raise an alarm when the count of a particular kind of event is less than 5 for the 3 hours leading up to the moment the check is evaluated, but I need to do this check every 15 minutes.
Since I need to check more frequently than the span of time I'm measuring, I can't do this based on my raw data (according to the docs, "[the schedule] interval matches the aggregate function interval for the check query". But I figured I could use a "task" to transform my data into a form that would work.
I was able to aggregate the data in the way that I hoped via a flux query, and I even saved the resultant rolling count to a dashboard.
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Results in the following scatterplot.
My hope was that I could just copy-paste this as a new task and get my nice new aggregated dataset. After resolving a couple of legible syntax errors, I settled on the following task definition:
option v = {timeRangeStart: -12h, timeRangeStop: now()}
option task = {name: "blech", every: 15m}
from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) =>
(r._measurement == "measurementA"))
|> filter(fn: (r) =>
(r._field == "booleanAttributeX"))
|> window(
every: 15m,
period: 3h,
timeColumn: "_time",
startColumn: "_start",
stopColumn: "_stop",
createEmpty: true,
)
|> count()
|> yield(name: "count")
|> to(bucket: "myBucket", org: "myOrg")
Unfortunately, I'm stuck on an error that I can't find any mention of anywhere: could not execute task run; Err: no time column detected: no time column detected.
If you could help me debug this task run error, or sidestep it by accomplishing this task in some other manner, I'll be very grateful.
I know I'm late here, but the to function needs a _time column, but the count aggregate you are adding returns a _start and _stop column to indicate the time frame for the count, not a _time.
You can solve this by either adding |> duplicate(column: "_stop", as: "_time") just before your to function, or leveraging the aggregateWindow function which handles this for you.
|> aggregateWindow(every: 15m, fn: count)
References:
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/count
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/duplicate/
https://v2.docs.influxdata.com/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/

Resources