Query last value in Flux - influxdb

I'm trying to get the last value from some IoT sensors and I actually achieved an intermediary result with the following Flux query:
from(bucket:"mqtt-bucket")
|> range(start:-10m )
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["thingy"] == "things/green-1/shadow/update"
or r["thingy"] == "things/green-3/shadow/update"
or r["thingy"] == "things/green-2/shadow/update")
|> filter(fn: (r) => r["_field"] == "data")
|> filter(fn: (r) => r["appId"] == "TEMP" or r["appId"] == "HUMID")
|> toFloat()
|> last()
The problem: I would like to get the last mesured value independently of a time range.
I saw in the docs that there is no way to unbound the range function. Maybe there is a work around ?

I just found this:
from(bucket: "stockdata")
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "nasdaq")
|> filter(fn: (r) => r["symbol"] == "OPEC/ORB")
|> last()

Related

how to return rows where value is 0 and a previous value in the past 7 days was > 0?

I have a measurement for ICMP responses which has percent_packetloss, packets_received, packets_sent.
I want to query for rows where percent_packetloss is 100, but only if the ip previously had less than 100% packet loss any time in the previous 7 days.
Something like this:
from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
|> keep(columns: ["_value", "ip"]
from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["ip"] == <ip from previous query>)
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] < 100)
Return rows from first query if the second query does not return None.
Can this type of operation be done in flux, if so, how?
I believe this solves the request:
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time_alerts", "_value_alerts", "_value_noalerts", "account_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> distinct(column: "ip")
|> sort(columns: ["_time"], desc: true)
I have refined and confirmed the following query meets the need.
The first query returns all rows with 100% loss. The second query returns rows where packets received were > 0 in the past 7 days. The two queries were then joined on the ip and matching rows from the first query are rendered.
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
|> distinct(column: "ip")
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time", "account_alerts", "_value_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> rename(columns: {"account_alerts": "account", "_value_alerts": "percent_packet_loss", "customer_alerts": "customer", "site_alerts": "site", "vendor_alerts": "vendor", "model_alerts": "model"})
|> sort(columns: ["_time"], desc: true)

Flux Query : iterate filter from a list

I search a way to filter by a loop/iterate from a list
is it possible?
The table tophdd contains 2 entries, but i can't filter these two entries with a regex.
tophdd = from(bucket: v.bucket)
|>range(start: v.timeRangeStart, stop: v.timeRangeStop)
|>filter(fn: (r) => r._measurement == "HDDID")
|>filter(fn: (r) => r.serial == "${Serial}")
|>filter(fn: (r) => r._field == "HDDID_IOPS")
|>highestMax(n:2,groupColumns: ["HDDID"])
|>keep(columns: ["HDDID" ])
|>from(bucket: v.bucket)
|>range(start: v.timeRangeStart, stop: v.timeRangeStop)
|>filter(fn: (r) => r._measurement == "HDDID")
|>filter(fn: (r) => r.serial == "${Serial}")
|>filter(fn: (r) => r._field == "HDDID_IOPS")
|>filter(fn: (r) => r.HDDID = =~ /"${tophdd}"/)
|>aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
i search to filter like this:
filter(fn: (r) => r.HDDID = =~ /"${tophdd}"/)
Is it possible to filter from a list?
Many thanks,
Looks like you just had a duplicate equal sign ( = = ) there. Try to update the query as follows:
filter(fn: (r) => r.HDDID =~ /"${tophdd}"/)
See more details here.
You can extract column values using findColumn to an array and then use contains function in the filter. Eg.
tophdd = from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> ...
|> keep(columns: ["HDDID" ])
|> findColumn(fn: (key) => true, column: "HDDID")
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "HDDID")
|> ...
|> filter(fn: (r) => contains(set: tophdd, value: r.HDDID))
|> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
Please note that the performance may be suboptimal as contains() is not a pushdown op.

Why is this Flux query faster?

The following query...
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
is much faster (like 100 ms vs. 3000 ms = factor 30x) than this equivalent query (on InfluxDB Cloud):
basis = from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
DATA = basis
|> filter(fn: (r) => r._measurement == "DATA")
DATA
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
DATA
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
What is the reason? I would expect the second query being faster (or at least just as fast), since I naively thought that the second query is optimized (since common filtered data DATA is reused). Instead, this seems to confuse InfluxDB and stop pushdown processing, making the second query slower.
Why is that?

InfluxDB aggregateWindow with spread very slow

Can someone with more experience with InfluxDB tell me why this query takes 20 seconds to execute:
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
while this one returns results instantly (0.1s)
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
The only difference is the aggregation funcion (spread vs max). What I'm trying to do is get daily power consumption / PV production for the last 31 days. I would assume this is a pretty simple query, but I guess I'm wrong about that. Please advise on how to improve performance.
I'm using a dockerized InfluxDB 2.1.1.
EDIT:
I ended up using this crude approach, which is getting max and min separately and calculating it myself. It runs in 0.12s.
maxes = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
mins = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
join(
tables: {min:mins, max:maxes},
on: ["_time"]
)
|> map(fn: (r) => ({ r with _value: r._value_max - r._value_min}))

Get difference using Flux

I have a simple counter which is stored in a InfluxDB. Now I would like to get the difference in the counter value between two points in time. So the result should only be a single value.
I tried the following query:
from(bucket: "influxdb")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["host"] == "telegraf-1-18-1")
|> filter(fn: (r) => r["topic"] == "shellies/shellyplug-s-DDE23E/relay/0/energy")
|> difference()
But this does not give the difference between the two counter values (actually I have no idea what the result is supposed to be).
Can anyone give me a hint on how to use difference correctly?

Resources