Agregation functions by each field in one query (Flux query language) - influxdb

I would like to do the following query:
SELECT last(field1), mean(field2), last(field3) FROM mymeasurement WHERE mykey="1" GROUP BY time(1h)
But I do not know how to do using Flux query language.
This is my current approach:
table0 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field1")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table1 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field2")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table2 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field3")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
join(on: ["_time"], tables: {"0":table0,"1":table1, "2":table2})
Is there any better way to do it?

Related

influxDB - Get daily Max. value

I have data with hydrological measurements.
I want to get the daily max Water flow:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: v.windowPeriod, fn: max, createEmpty: false)
|> yield(name: "max")
For some reason, for some days, this returns multiple measurements per day.
But not always.
How do I get only the max entry per day?
You need to set the every parameter in the aggregateWindow method to 1d:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
|> yield(name: "max")
See the Flux documentation for more details.

Math with full query in Flux

I have some power sensors for devices like fridge or PC in my house and a full sensor that measures the complete house consumption.
What I try to achieve a pie chart with the individual divese-usages for my house. That works great. Only problme is, that I now need to calculate a "rest" or "others" values.
For that I want to take the number from the full sensor and subtract all other values.
I have two individual queries that are giving me the two numbers. I just dont find a way to subtract one from the other.
The queries are as follows:
Full sensor:
from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_em3-01")
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
Sum of all other devices:
from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage" or r["_measurement"] == "devices_power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_plug_wohnwand" or r["type"] == "sh_plug_office2" or r["type"] == "sh_plug_office1" or r["type"] == "sh_plug_kuehlschrank" or r["type"] == "sh_plug_datacenter" or r["type"] == "sh1_plpm_gartenhaus")
|> group(columns: ["_field"])
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
Does anyone has an idea how I can achieve that?
Best Regards
Lasse
This should work. You could save whole result to variable like this:
full_sensor = from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_em3-01")
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
other = from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage" or r["_measurement"] == "devices_power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_plug_wohnwand" or r["type"] == "sh_plug_office2" or r["type"] == "sh_plug_office1" or r["type"] == "sh_plug_kuehlschrank" or r["type"] == "sh_plug_datacenter" or r["type"] == "sh1_plpm_gartenhaus")
|> group(columns: ["_field"])
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
full_sensor - other

Why is this Flux query faster?

The following query...
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
is much faster (like 100 ms vs. 3000 ms = factor 30x) than this equivalent query (on InfluxDB Cloud):
basis = from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
DATA = basis
|> filter(fn: (r) => r._measurement == "DATA")
DATA
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
DATA
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
What is the reason? I would expect the second query being faster (or at least just as fast), since I naively thought that the second query is optimized (since common filtered data DATA is reused). Instead, this seems to confuse InfluxDB and stop pushdown processing, making the second query slower.
Why is that?

InfluxDB aggregateWindow with spread very slow

Can someone with more experience with InfluxDB tell me why this query takes 20 seconds to execute:
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
while this one returns results instantly (0.1s)
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
The only difference is the aggregation funcion (spread vs max). What I'm trying to do is get daily power consumption / PV production for the last 31 days. I would assume this is a pretty simple query, but I guess I'm wrong about that. Please advise on how to improve performance.
I'm using a dockerized InfluxDB 2.1.1.
EDIT:
I ended up using this crude approach, which is getting max and min separately and calculating it myself. It runs in 0.12s.
maxes = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
mins = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
join(
tables: {min:mins, max:maxes},
on: ["_time"]
)
|> map(fn: (r) => ({ r with _value: r._value_max - r._value_min}))

InfluxDB Flux join series

I have following data in influxdb
server,operation=ADD queryMs=7.9810 1620608972904452000
server,operation=GET queryMs=12.2430 1620608972909339200
server,operation=UPDATE queryMs=11.5780 1620608972909655400
server,operation=ADD queryMs=11.2460 1620608972910445700
server,operation=GET queryMs=15.0620 1620608972911305000
etc...
So in my graph i see three series
I want to achieve one series of all operations.
I tried to |> group(columns: ["_field"]), and this is what i need, but query is extremely slow!
from(bucket: "initial")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "server")
|> filter(fn: (r) => r["_field"] == "queryMs")
|> group(columns: ["_field"])
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")
Any fast solutions for my problem?
This works faster
union(tables: [
from(bucket: "initial")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "server")
|> filter(fn: (r) => r["_field"] == "queryMs")
|> filter(fn: (r) => r["operation"] == "GET")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false),
from(bucket: "initial")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "server")
|> filter(fn: (r) => r["_field"] == "queryMs")
|> filter(fn: (r) => r["operation"] == "ADD")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false),
from(bucket: "initial")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "server")
|> filter(fn: (r) => r["_field"] == "queryMs")
|> filter(fn: (r) => r["operation"] == "UPDATE")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false),
])
|> drop(columns:["operation"])
|> sort(columns: ["_time"], desc: false)
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "mean")

Resources