InfluxDb Flux query is unexpectedly slow when multiple tables are returned - influxdb

I am moving from InfluxDb 1.x to InfluxDb 2.x. The application shows a chart for min max and mean of measurements so I like to receive that information from one single query.
The query that I use is based on this example and look like this:
data = from(bucket: "my-bucket")
|> range(start: 2022-09-01T00:00:00Z, stop: 2022-09-10T00:00:00Z)
|> filter(fn: (r) => r["_measurement"] == "temperature")
valueMin = data
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
|> yield(name: "min")
valueMax = data
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
|> yield(name: "min")
valueAvg = data
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
|> yield(name: "avg")
When I run the query for one output the query time is 0.04 seconds, but when I add another table the query time increases to 13(!) seconds.
The following query runs 0.04 seconds:
data = from(bucket: "my-bucket")
|> range(start: 2022-09-01T00:00:00Z, stop: 2022-09-10T00:00:00Z)
|> filter(fn: (r) => r["_measurement"] == "temperature")
valueMin = data
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
|> yield(name: "min")
While the next query takes 13 seconds:
data = from(bucket: "my-bucket")
|> range(start: 2022-09-01T00:00:00Z, stop: 2022-09-10T00:00:00Z)
|> filter(fn: (r) => r["_measurement"] == "temperature")
valueMin = data
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
|> yield(name: "min")
valueMax = data
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
|> yield(name: "min")
What is wrong here / how can flux query response be improved?
I am currently on influxd version 2.6.1

Related

influx query: how to get historical average

I am SQL native struggling with flux syntax (philosophy?) once again. Here is what I am trying to do: plot values of a certain measurement as a ratio of their historical average (say over the past month).
Here is as far as I have gotten:
from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
I believe this produces an average over the past 30 days, for each day window. Now what I don't know how to do is divide the original data by these values in order to get the relative change, i.e. something like value(_time)/tma_value(_time).
Thanks to #Munun, I got the following code working. I made a few changes since my original post to make things work as I needed.
import "date"
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: 1h, fn: sum)
|> map(fn: (r) => ({r with window_value: float(v: r._value)}))
t2 = from(bucket: "secret_bucket")
|> range(start: date.sub(from: v.timeRangeStop, d: 45d), stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> mean(column: "_value")
|> group()
|> map(fn: (r) => ({r with avg_value: r._value}))
join(tables: {t1: t1, t2: t2}, on: ["query"])
|> map(fn: (r) => ({r with _value: (r.window_value - r.avg_value)/ r.avg_value * 100.0 }))
|> keep(columns: ["_value", "_time", "query"])
Here are few steps you could try:
re-add _time after the aggregate function so that you can have same number of records as the original one:
|> duplicate(column: "_stop", as: "_time")
calculate the ratio with two data sources via join and map
The final Flux could be:
t1 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
|> group(columns: ["query"])
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> timedMovingAverage(every: 1d, period: 30d)
|> duplicate(column: "_stop", as: "_time")
t2 = from(bucket: "secret_bucket")
|> range(start: v.timeRangeStart, stop:v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "pg_stat_statements_fw")
join(tables: {t1: t1, t2: t2}, on: ["hereIsTheTagName"])
|> map(fn: (r) => ({r with _value: r._value_t2 / r._value_t1 * 100.0}))

Why is this Flux query faster?

The following query...
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
is much faster (like 100 ms vs. 3000 ms = factor 30x) than this equivalent query (on InfluxDB Cloud):
basis = from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
DATA = basis
|> filter(fn: (r) => r._measurement == "DATA")
DATA
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
DATA
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
What is the reason? I would expect the second query being faster (or at least just as fast), since I naively thought that the second query is optimized (since common filtered data DATA is reused). Instead, this seems to confuse InfluxDB and stop pushdown processing, making the second query slower.
Why is that?

influxDB return 0 instead of 'no results'

we are using the influxDB for statistics and dashboards. We love it! Blazing fast and easy to integrate. However we are stuck when we launch new features.
We have the following FLUX query. A massive database with all "model_events" based on the businessUUID. However if the business doesn't have a car.created it returns no results instead of a range with 0's. If it has one car.created even without the range it will return a 0 range. Is there a possibility to always get the range even if the _measurement doesn't have a value?
from(bucket: "_events")
|> range(start: 2022-09-01, stop: 2022-09-11)
|> filter(fn: (r) => r["_measurement"] == "car.created")
|> filter(fn: (r) => r["business_uuid"] == "055ade92-ecd9-47b1-bf85-c1381d0afd22")
|> aggregateWindow(every: 1d, fn: count, createEmpty: true)
|> yield(name: "amount")
BTW.... a bit new to InfluxDB...
Maybe you could create a dummy table and union() it like:
import "experimental/array"
rows = [{_time: now(), _field: "someField", _value: 0}]
dummy = array.from(rows: rows)
data = from(bucket: "_events")
|> range(start: 2022-09-01, stop: 2022-09-11)
|> filter(fn: (r) => r["_measurement"] == "car.created")
|> filter(fn: (r) => r["business_uuid"] == "055ade92-ecd9-47b1-bf85-c1381d0afd22")
|> aggregateWindow(every: 1d, fn: count, createEmpty: true)
|> yield(name: "amount")
union(tables: [dummy, data])

InfluxDB2.0 How to calculate daily value over bigger timeframe?

I'm migrating my InfluxDB1.8 version to InfluxDB2.0
I'm using a influxDB2.0 database and use grafana to display results.
What I insert as data are the results of my P1 meter, altough the results are total values, I would like to calculate and display the daily results.
What is being inserting is the current (gas usage) value. By calculating the difference of the begin and end of the day, I have my daily usage result.
I did find out a way to do this for 1 day. With the Spread function. But I don't get it working for a longer timeframe then 1 day.
But now to display this on a daily usage on a longer timeframe. I didn't find the right option to get this working
Week results
Anyone an idea?
Query for 1 day:
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Gas-usage")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> spread(column: "_value")```
I did some checks on the 1.8 one and what works there is:
SELECT spread("value")
FROM "Gas-usage"
WHERE $timeFilter
GROUP BY time(1d) fill(null) tz('Europe/Berlin')
what is the equivalant of this query in influxdb 2.0 ?
Try change your aggregate window, like this:
|> aggregateWindow(every: 1d, fn: mean)
use the spread function inside your aggreagateWindow function.
should be like this:
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "Gas-usage")
|> filter(fn: (r) => r["_field"] == "value")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
from(bucket: "${bucket}")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "system")
|> filter(fn: (r) => r.host == "${host}")
|> filter(fn: (r) => r["_field"] == "uptime")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
result of my grafana

Increasing byte counter for influxdb

I have a bytes counter being sent to InfluxDB and the below query to show the data:
from(bucket: "PolygonIoStreamTelemetry")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "bytes-received" or r["_measurement"] == "bytes-sent")
|> filter(fn: (r) => r["_field"] == "Value")
|> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
|> yield(name: "last")
This yields a straight line, I would however like to get a bytes per sample - the difference.
Also when the application resets, this counter would go from zero so there would be a large negative increase for the first sample on reset.
I see there is an Increase function, how would I use it in relation to the above?

Resources