Flux Query : iterate filter from a list - influxdb

I search a way to filter by a loop/iterate from a list
is it possible?
The table tophdd contains 2 entries, but i can't filter these two entries with a regex.
tophdd = from(bucket: v.bucket)
|>range(start: v.timeRangeStart, stop: v.timeRangeStop)
|>filter(fn: (r) => r._measurement == "HDDID")
|>filter(fn: (r) => r.serial == "${Serial}")
|>filter(fn: (r) => r._field == "HDDID_IOPS")
|>highestMax(n:2,groupColumns: ["HDDID"])
|>keep(columns: ["HDDID" ])
|>from(bucket: v.bucket)
|>range(start: v.timeRangeStart, stop: v.timeRangeStop)
|>filter(fn: (r) => r._measurement == "HDDID")
|>filter(fn: (r) => r.serial == "${Serial}")
|>filter(fn: (r) => r._field == "HDDID_IOPS")
|>filter(fn: (r) => r.HDDID = =~ /"${tophdd}"/)
|>aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
i search to filter like this:
filter(fn: (r) => r.HDDID = =~ /"${tophdd}"/)
Is it possible to filter from a list?
Many thanks,

Looks like you just had a duplicate equal sign ( = = ) there. Try to update the query as follows:
filter(fn: (r) => r.HDDID =~ /"${tophdd}"/)
See more details here.

You can extract column values using findColumn to an array and then use contains function in the filter. Eg.
tophdd = from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> ...
|> keep(columns: ["HDDID" ])
|> findColumn(fn: (key) => true, column: "HDDID")
from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "HDDID")
|> ...
|> filter(fn: (r) => contains(set: tophdd, value: r.HDDID))
|> aggregateWindow(column: "_value", every: v.windowPeriod, fn: mean)
Please note that the performance may be suboptimal as contains() is not a pushdown op.

Related

how to return rows where value is 0 and a previous value in the past 7 days was > 0?

I have a measurement for ICMP responses which has percent_packetloss, packets_received, packets_sent.
I want to query for rows where percent_packetloss is 100, but only if the ip previously had less than 100% packet loss any time in the previous 7 days.
Something like this:
from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
|> keep(columns: ["_value", "ip"]
from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["ip"] == <ip from previous query>)
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] < 100)
Return rows from first query if the second query does not return None.
Can this type of operation be done in flux, if so, how?
I believe this solves the request:
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time_alerts", "_value_alerts", "_value_noalerts", "account_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> distinct(column: "ip")
|> sort(columns: ["_time"], desc: true)
I have refined and confirmed the following query meets the need.
The first query returns all rows with 100% loss. The second query returns rows where packets received were > 0 in the past 7 days. The two queries were then joined on the ip and matching rows from the first query are rendered.
alerts = from(bucket: "customerData")
|> range(start: -5m)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "percent_packet_loss" and r["_value"] == 100)
noalerts = from(bucket: "customerData")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "ping")
|> filter(fn: (r) => r["_field"] == "packets_received" and r["_value"] > 0)
|> distinct(column: "ip")
join(
tables: {alerts, noalerts},
on: ["ip"],
method: "inner"
)
|> keep(columns:["_time", "account_alerts", "_value_alerts", "customer_alerts", "model_alerts", "site_alerts", "ip", "vendor_alerts"])
|> rename(columns: {"account_alerts": "account", "_value_alerts": "percent_packet_loss", "customer_alerts": "customer", "site_alerts": "site", "vendor_alerts": "vendor", "model_alerts": "model"})
|> sort(columns: ["_time"], desc: true)

influxDB - Get daily Max. value

I have data with hydrological measurements.
I want to get the daily max Water flow:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: v.windowPeriod, fn: max, createEmpty: false)
|> yield(name: "max")
For some reason, for some days, this returns multiple measurements per day.
But not always.
How do I get only the max entry per day?
You need to set the every parameter in the aggregateWindow method to 1d:
from(bucket: "API")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "hydro")
|> filter(fn: (r) => r["_field"] == "temperature")
|> filter(fn: (r) => r["loc"] == "XXX")
|> aggregateWindow(every: 1d, fn: max, createEmpty: false)
|> yield(name: "max")
See the Flux documentation for more details.

Agregation functions by each field in one query (Flux query language)

I would like to do the following query:
SELECT last(field1), mean(field2), last(field3) FROM mymeasurement WHERE mykey="1" GROUP BY time(1h)
But I do not know how to do using Flux query language.
This is my current approach:
table0 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field1")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table1 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field2")
|> aggregateWindow(every: 1h, fn: mean, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
table2 = from(bucket: "mybucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mymeasurement")
|> filter(fn: (r) => r["mykey"] == "1")
|> filter(fn: (r) => r["_field"] == "field3")
|> aggregateWindow(every: 1h, fn: last, createEmpty: false)
|> pivot(rowKey:["_time"], columnKey:["_field"], valueColumn:"_value")
join(on: ["_time"], tables: {"0":table0,"1":table1, "2":table2})
Is there any better way to do it?

How to get a query variable on InfluxDB 2.0 dashboard?

I read the documentation https://docs.influxdata.com/influxdb/v2.0/visualize-data/variables/
I thought great that will be a piece of cake.
I take a look at an existing query variable named bucket:
buckets()
|> filter(fn: (r) => r.name !~ /^_/)
|> rename(columns: {name: "_value"})
|> keep(columns: ["_value"])
It returns this data:
#group,false,false,false
#datatype,string,long,string
#default,_result,,
,result,table,_value
,,0,pool
,,0,test
The bucket variable works and I can refer to it as v.bucket in the cell queries of any dashboard.
Building on this example I craft the following query:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "minerstat")
|> keep(columns: ["account"])
|> distinct(column: "account")
|> keep(columns: ["_value"])
That returns this data:
#group,false,false,false
#datatype,string,long,string
#default,_result,,
,result,table,_value
,,0,0x04ff4e0c05c0feacccf93251c52a78639e0abef4
,,0,0x201f1a58f31801dcd09dc75616fa40e07a70467f
,,0,0x80475710b08ef41f5361e07ad5a815eb3b11ed7b
,,0,0xa68a71f0529a864319082c2475cb4e495a5580fd
And I save it as a query variable with the name account.
Then I use it in a dashboard cell query like this:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "minerstat")
|> filter(fn: (r) => r["account"] == v.account)
|> filter(fn: (r) => r["_field"] == "currentHashrate" or r["_field"] == "hashrate")
|> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
|> yield(name: "last")
But this returns no data. And the dropdown menu for the account variable on the dashboard view is empty.
If I replace v.account above with one of the value returned by the query behind the variable:
from(bucket: "pool")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "minerstat")
|> filter(fn: (r) => r["account"] == "0x04ff4e0c05c0feacccf93251c52a78639e0abef4")
|> filter(fn: (r) => r["_field"] == "currentHashrate" or r["_field"] == "hashrate")
|> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
|> yield(name: "last")
That works as intended and display a nice graph.
What am I missing here?
SOLUTION: you cannot use variables inside the definition of a variable.
I replaced
start: v.timeRangeStart, stop: v.timeRangeStop
with
start: -1d
in the variable definition:
from(bucket: "pool")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "minerstat")
|> keep(columns: ["account"])
|> distinct(column: "account")
|> keep(columns: ["_value"])
I don't think you can use variables within variables, so things like v.timeRangeStart that you can use in a dashboard query can't be used to define another dashboard variable.
You can use duration literals though, like -5d or -2h in your range() call though.

InfluxDB Flux - Getting last and first values as a column

I am trying to create two new columns with the first and last values using the last() and first() functions. However the function isn’t working when I try to map the new columns. Here is the sample code below. Is this possible using Flux?
from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "price_info")
|> filter(fn: (r) => r["_field"] == "price")
|> map(fn: (r) => ({r with
open: last(float(v: r._value)),
close: first(float(v: r._value)),
})
I am not answering directly to the question, however it might help.
I wanted to perform some calculation between first and last, here is my method, I have no idea if it is the right way to do.
The idea is to create 2 tables, one with only the first value and the other with only the last value, then to perform a union between both.
data = from(bucket: "bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "plop")
l = data
|> last()
|> map(fn:(r) => ({ r with _time: time(v: "2011-01-01T01:01:01.0Z") }))
f = data
|> first()
|> map(fn:(r) => ({ r with _time: time(v: "2010-01-01T01:01:01.0Z") }))
union(tables: [f, l])
|> sort(columns: ["_time"])
|> difference()
For an unknown reason I have to set wrong date, just to be able to sort values and take into account than first is before last.
Just a quick thank you. I was struggeling with this as well. This is my code now:
First = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> first()
|> yield(name: "First")
Last = from(bucket: "FirstBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "mqtt_consumer")
|> filter(fn: (r) => r["topic"] == "Counters/Watermeter 1")
|> filter(fn: (r) => r["_field"] == "Counter")
|> last()
|> yield(name: "Last")
union(tables: [First, Last])
|> difference()
Simple answer is to use join (You may also use old join, when using "new" join remember to import "join")
Example:
import "join"
balance_asset_gen = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
balance_asset_raw = from(bucket: "telegraf")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "balance_raw")
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
// In my example I merge two data sources but you may just use 1 data source
balances_merged = union(tables: [balance_asset_gen, balance_asset_raw])
|> group(columns:["_time"], mode:"by")
|> sum()
f = balances_merged |> first()
l = balances_merged |> last()
// Watch out, here we assume we work on single TABLE (we don't have groups/one group)
join.left(
left: f,
right: l,
on: (l, r) => l.my_tag == r.my_tag, // pick on what to merge e.g. l._measurement == r._measurement
as: (l, r) => ({
_time: r._time,
_start: l._time,
_stop: r._time,
_value: (r._value / l._value), // we can calculate new field
first_value: l._value,
last_value: r._value,
}),
)
|> yield()

Resources