how to return rows where value is 0 and a previous value in the past 7 days was > 0? - influxdb

I have a measurement for ICMP responses which has percent_packetloss, packets_received, packets_sent.
I want to query for rows where percent_packetloss is 100, but only if the ip previously had less than 100% packet loss any time in the previous 7 days.
Something like this:
from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
|> keep(columns: ["_value", "ip"]
from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["ip"] == <ip from previous query>)
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] < 100)
Return rows from first query if the second query does not return None.
Can this type of operation be done in flux, if so, how?
I believe this solves the request:
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time_alerts", "_value_alerts", "_value_noalerts", "account_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> distinct(column: "ip")
|> sort(columns: ["_time"], desc: true)

I have refined and confirmed the following query meets the need.
The first query returns all rows with 100% loss. The second query returns rows where packets received were > 0 in the past 7 days. The two queries were then joined on the ip and matching rows from the first query are rendered.
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
|> distinct(column: "ip")
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time", "account_alerts", "_value_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> rename(columns: {"account_alerts": "account", "_value_alerts": "percent_packet_loss", "customer_alerts": "customer", "site_alerts": "site", "vendor_alerts": "vendor", "model_alerts": "model"})
|> sort(columns: ["_time"], desc: true)

Related

Math with full query in Flux

I have some power sensors for devices like fridge or PC in my house and a full sensor that measures the complete house consumption.
What I try to achieve a pie chart with the individual divese-usages for my house. That works great. Only problme is, that I now need to calculate a "rest" or "others" values.
For that I want to take the number from the full sensor and subtract all other values.
I have two individual queries that are giving me the two numbers. I just dont find a way to subtract one from the other.
The queries are as follows:
Full sensor:
from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_em3-01")
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
Sum of all other devices:
from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage" or r["_measurement"] == "devices_power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_plug_wohnwand" or r["type"] == "sh_plug_office2" or r["type"] == "sh_plug_office1" or r["type"] == "sh_plug_kuehlschrank" or r["type"] == "sh_plug_datacenter" or r["type"] == "sh1_plpm_gartenhaus")
|> group(columns: ["_field"])
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
Does anyone has an idea how I can achieve that?
Best Regards
Lasse
This should work. You could save whole result to variable like this:
full_sensor = from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_em3-01")
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
other = from(bucket: "hoi2c")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "power_usage" or r["_measurement"] == "devices_power_usage")
|> filter(fn: (r) => r["_field"] == "total_usage_no_neg")
|> filter(fn: (r) => r["type"] == "sh_plug_wohnwand" or r["type"] == "sh_plug_office2" or r["type"] == "sh_plug_office1" or r["type"] == "sh_plug_kuehlschrank" or r["type"] == "sh_plug_datacenter" or r["type"] == "sh1_plpm_gartenhaus")
|> group(columns: ["_field"])
|> aggregateWindow(every: 100y, fn: sum, createEmpty: false)
|> yield(name: "sum")
full_sensor - other

Why is this Flux query faster?

The following query...
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
is much faster (like 100 ms vs. 3000 ms = factor 30x) than this equivalent query (on InfluxDB Cloud):
basis = from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
DATA = basis
|> filter(fn: (r) => r._measurement == "DATA")
DATA
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
DATA
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
What is the reason? I would expect the second query being faster (or at least just as fast), since I naively thought that the second query is optimized (since common filtered data DATA is reused). Instead, this seems to confuse InfluxDB and stop pushdown processing, making the second query slower.
Why is that?

InfluxDB aggregateWindow with spread very slow

Can someone with more experience with InfluxDB tell me why this query takes 20 seconds to execute:
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
while this one returns results instantly (0.1s)
from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
The only difference is the aggregation funcion (spread vs max). What I'm trying to do is get daily power consumption / PV production for the last 31 days. I would assume this is a pretty simple query, but I guess I'm wrong about that. Please advise on how to improve performance.
I'm using a dockerized InfluxDB 2.1.1.
EDIT:
I ended up using this crude approach, which is getting max and min separately and calculating it myself. It runs in 0.12s.
maxes = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
mins = from(bucket: "FOO")
|> range(start: 2022-02-11T23:00:00Z, stop: 2022-03-15T22:59:59.999Z)
|> filter(fn: (r) => r["_measurement"] == "bar")
|> filter(fn: (r) => r["_field"] == "value")
|> filter(fn: (r) => r["device"] == "baz")
|> filter(fn: (r) => r["type"] == "Import")
|> aggregateWindow(every: 1d, fn: min, createEmpty: false)
join(
tables: {min:mins, max:maxes},
on: ["_time"]
)
|> map(fn: (r) => ({ r with _value: r._value_max - r._value_min}))

Agregation functions by each field in one query (Flux query language)

I would like to do the following query:
SELECT last(field1), mean(field2), last(field3) FROM mymeasurement WHERE mykey="1" GROUP BY time(1h)
But I do not know how to do using Flux query language.
This is my current approach:
table0 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field1")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table1 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field2")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table2 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field3")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
join(on: ["_time"], tables: {"0":table0,"1":table1, "2":table2})
Is there any better way to do it?

Query last value in Flux

I'm trying to get the last value from some IoT sensors and I actually achieved an intermediary result with the following Flux query:
from(bucket:"mqtt-bucket")
|> range(start:-10m )
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["thingy"] == "things/green-1/shadow/update"
or r["thingy"] == "things/green-3/shadow/update"
or r["thingy"] == "things/green-2/shadow/update")
|> filter(fn: (r) => r["_field"] == "data")
|> filter(fn: (r) => r["appId"] == "TEMP" or r["appId"] == "HUMID")
|> toFloat()
|> last()
The problem: I would like to get the last mesured value independently of a time range.
I saw in the docs that there is no way to unbound the range function. Maybe there is a work around ?
I just found this:
from(bucket: "stockdata")
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "nasdaq")
|> filter(fn: (r) => r["symbol"] == "OPEC/ORB")
|> last()

Resources