InfluxDb flux query: filter by time - influxdb

I have this flux aggregation query
from(bucket: db)
|> range(start: dateFrom, stop: dateTo)
|> filter(fn: (r) => r._measurement == tableName and (r._field == "value"))
|> filter(fn: (r) => (r.sensorId == "sensor1")
or (r.sensorId == "sensor2")
or (r.sensorId == "filter_sensor"))
|> aggregateWindow(every: 30s, fn: mean)
|> keep(columns: ["_time", "sensorId", "_value"])
|> pivot(rowKey:["_time"], columnKey: ["sensorId"], valueColumn: "_value")
|> yield(name:"result")
I need to remove before aggregation rows for all tags with _time when "filter_sensor" == 1 (when filter_sensor equals 1 - so it is invalid data and this time must be ignored in the aggregation)
How I can do this?
Thanks for reading and responses.

Related

Why is this Flux query faster?

The following query...
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
|> filter(fn: (r) => r._measurement == "DATA")
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
is much faster (like 100 ms vs. 3000 ms = factor 30x) than this equivalent query (on InfluxDB Cloud):
basis = from(bucket: "mybucket")
|> range(start: 2021-10-01T21:16:00.000Z, stop: 2022-10-01T21:16:00.000Z)
DATA = basis
|> filter(fn: (r) => r._measurement == "DATA")
DATA
|> filter(fn: (r) => r["code"] == "88820MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_88820MS")
DATA
|> filter(fn: (r) => r["code"] == "86900MS")
|> filter(fn: (r) => r._field == "airPressure")
|> aggregateWindow(every: 86400s, fn: mean, createEmpty: false)
|> sort(columns: ["_time"])
|> yield(name:"PUB_86900MS")
What is the reason? I would expect the second query being faster (or at least just as fast), since I naively thought that the second query is optimized (since common filtered data DATA is reused). Instead, this seems to confuse InfluxDB and stop pushdown processing, making the second query slower.
Why is that?

influxDB return 0 instead of 'no results'

we are using the influxDB for statistics and dashboards. We love it! Blazing fast and easy to integrate. However we are stuck when we launch new features.
We have the following FLUX query. A massive database with all "model_events" based on the businessUUID. However if the business doesn't have a car.created it returns no results instead of a range with 0's. If it has one car.created even without the range it will return a 0 range. Is there a possibility to always get the range even if the _measurement doesn't have a value?
from(bucket: "_events")
|> range(start: 2022-09-01, stop: 2022-09-11)
|> filter(fn: (r) => r["_measurement"] == "car.created")
|> filter(fn: (r) => r["business_uuid"] == "055ade92-ecd9-47b1-bf85-c1381d0afd22")
|> aggregateWindow(every: 1d, fn: count, createEmpty: true)
|> yield(name: "amount")
BTW.... a bit new to InfluxDB...
Maybe you could create a dummy table and union() it like:
import "experimental/array"
rows = [{_time: now(), _field: "someField", _value: 0}]
dummy = array.from(rows: rows)
data = from(bucket: "_events")
|> range(start: 2022-09-01, stop: 2022-09-11)
|> filter(fn: (r) => r["_measurement"] == "car.created")
|> filter(fn: (r) => r["business_uuid"] == "055ade92-ecd9-47b1-bf85-c1381d0afd22")
|> aggregateWindow(every: 1d, fn: count, createEmpty: true)
|> yield(name: "amount")
union(tables: [dummy, data])

What does `|>` mean in TICKscript

Trying to write my fist TICKscript to work out when two sensor values cross: if the outside temperature has changed from lower to higher than the inside temperature then I need to close the windows (and conversely).
Using the query builder in InfluxDB I'm getting this for the meadian of the temperature values inside the house over the last 15 minutes:
from(bucket: "zigbee")
|> range(start: -15m, stop: now())
|> filter(fn: (r) => r["room"] == "Kitchen" or r["room"] == "DiningRoom" or r["room"] == "Bed3" or r["room"] == "Bed1")
|> filter(fn: (r) => r["_field"] == "temperature")
|> group(columns: ["_measurement"])
|> aggregateWindow(every: 15m, fn: mean, createEmpty: false)
|> yield(name:"inside")
The syntax |> appears to undocumented -- can you provide a reference?
Replacing |> with | breaks it.
It seems that group and aggregateWindow do not commute?
Presumably because aggregateWindow is forced to choose a single representative _time value for each window?
I think the plan is to
assign this to a stream,
copy and edit to creata a second stream shifted by 15 minutes,
create a second pair of streams for the outside temperature.
join all four streams and caluclate a value indicating whether the inside and outside temperatures have crossed over.
Unless you have a better idea?
(Right now it's looking easier to import the data into SQL.)
Check InfluxDB Flux language documentation for |>:
InfluxDB Pipe-forward operator
According to your flux syntax query:
from(bucket: "zigbee")
|> range(start: -15m, stop: now())
|> filter(fn: (r) => r["room"] == "Kitchen" or r["room"] == "DiningRoom" or r["room"] == "Bed3" or r["room"] == "Bed1")
|> filter(fn: (r) => r["_field"] == "temperature")
|> group(columns: ["_measurement"])
|> aggregateWindow(every: 15m, fn: mean, createEmpty: false)
|> yield(name:"inside")
You are taking data from bucket "zigbee"
Data from source are passed to range filter function with pipe-forward |> operator
Results from range filter data are passed to next filter function with another pipe-forward operator
Etc.
So all data flows as a result from one function to another.
You can group by but in your case columns are "room" key values if I understand your intentions correctly, so try:
|> group(columns: ["room"])
There is a difference between key values and measurement names - you should check InfluxDB documentation for understatnding data structure.
Flux data model documentation
I'ts not TICKscript, it's something do to with InfluxDB that might be called flux.
mean = from(bucket: "zigbee")
|> range(start: -5d, stop: now())
|> filter(fn: (r) => r["room"] == "Outside")
|> filter(fn: (r) => r["_measurement"] == "temperature")
|> aggregateWindow(every: 30m, fn: mean, createEmpty: false)
shift = mean
|> timeShift(duration: -3h)
j = join(tables: {mean: mean, shift: shift}, on: ["_time"])
|> map(fn: (r) => ({ r with diff: float(v: r._value_mean) - float( v: r._value_shift) }))
// yield contains 1 table with the required columns, but the UI doesn't understand it.
// The UI requires 1 table for each series.
j |> map(fn: (r) => ({_time: r._time, _value: r._value_mean})) |> yield(name: "mean")
j |> map(fn: (r) => ({_time: r._time, _value: r._value_shift})) |> yield(name: "shift")
j |> map(fn: (r) => ({_time: r._time, _value: r.diff})) |> yield(name: "diff")
The |> in TickScript "Declares a chaining method call which creates an instance of a new node and chains it to the node above it." as said in the official documentation

InfluxDB 2.0 - Flux query: How to sum a column and use the sum for further calculations

I am new to flux query language (with Influx DB 2) and cant find a solution for the following problem:
I have data with changing true and false values:
I was able to calculate the time in seconds until the next change by using the events.duration function:
Now I want to calculate the total time and the time of all "false"-events and after that I want to calculate the percentage of all false events. I tryed the following
import "contrib/tomhollingworth/events"
total = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> sum(column: "duration")
|> yield(name: "total")
downtime = from(bucket: "********")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "********")
|> filter(fn: (r) => r["Server"] == "********")
|> filter(fn: (r) => r["_field"] == "********")
|> filter(fn: (r) => r["DataNode"] == "********")
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration")
|> drop(columns: ["true"])
|> sum(column: "false")
|> yield(name: "downtime")
downtime_percentage = downtime.false / total.duration
With this I am getting the following error error #44:23-44:31: expected {A with false:B} but found [C]
I also tryed some variations but couldnet get it to work.
I guess I am getting some basic things wrong but I couldnt figure it out yet. Let me know, if you need more information.
I have found a way to solve my problem. Although I am sure that there is a more elegant solution, I document my way here, maybe it helps someone and we can improve it together.
import "contrib/tomhollingworth/events"
//Set time window in seconds (based on selected time)
time_window = int(v: v.timeRangeStart)/-1000000000
//Filter (IoT-)Data
data= from(bucket: "*******")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "*******")
|> filter(fn: (r) => r["Server"] == "*******")
|> filter(fn: (r) => r["Equipment"] == "*******")
|> filter(fn: (r) => r["DataNode"] == "******")
//Use events.duration to calculate the duration in seconds of each true/false event.
|> events.duration(
unit: 1s,
columnName: "duration",
timeColumn: "_time",
stopColumn: "_stop"
)
//Sum up the event times via "sum()" and save them as an array variable via "findColumn()". This is the only way to access the value later (As far as I know. please let me know if you know other ways!).
total_array = data
|> sum(column: "duration")
|> findColumn(
fn: (key) => key._field == "*******",
column: "duration",
)
//Calculate "missing time" in seconds in the time window, because the first event in the time window is missing.
missing_time = time_window - total_array[0]
//Create an array with the first event to determine if it is true or false
first_value_in_window = data
|> first()
|> findColumn(
fn: (key) => key._field == "*******",
column: "_value",
)
//Calculate the downtime by creating columns with the true and false values via pivot. Then sum up the column with the false values
downtime = data
|> map(fn: (r) => ({ r with duration_percentage: float(v: r.duration)/float(v: time_window) }))
|> pivot(rowKey:["_time"], columnKey: ["_value"], valueColumn: "duration_percentage")
|> map( fn: (r) => ({r with
downtime: if exists r.false then
r.false
else
0.0
}))
|> sum(column: "downtime")
//Create an array with the downtime so that this value can be accessed later on
downtime_array = downtime
|> findColumn(
fn: (key) => key._field == "PLS_Antrieb_laeuft",
column: "downtime",
)
//If the first value in the considered time window is true, then the remaining time in the time window (missing_time) was downtime. Write this value in the column "false_percentage_before_window".
//The total downtime is calculated from the previously calculated sum(downtime_array) and, if applicable, the downtime of the remaining time in the time window if the first value is true (first_value_in_window[0])
data
|> map( fn: (r) => ({r with
false_percentage_before_window: if first_value_in_window[0] then
float(v: missing_time)/float(v: time_window)
else
0.0
}))
|> map(fn: (r) => ({ r with _value: (downtime_array[0] + r.false_percentage_before_window) * 100.00 }))
|> first()
|> keep(columns: ["_value"])
|> yield(name: "Total Downtime")
This solution assumes that the true/false events only occur alternately.

How to merge (join) two tables in a specific way in Grafana using InfluxDB flux query?

Grafana: 7.1.5
InfluxDB: 1.8
I currently have three separate table panels in Grafana where the only difference between each query is the time range (Year, Month, Day). I would like to combine these three tables into one, where the measurement's value is separated into three columns (one for each time range).
More explicitly, what I have currently is:
Table1 Columns: [Tag1+Tag2, _value] where _value is the units this
year
Table2 Columns: [Tag1+Tag2, _value] where _value is the units
this month
Table3 Columns: [Tag1+Tag2, _value] where _value is the units this
day
What I want is:
Table Columns: [Tag1+Tag2, Table1_value (Year), Table2_value (Month), Table3_value (Day)]
These are my queries:
import "date"
thisYearSoFar = date.truncate(t: now(), unit: 1y)
thisMonthSoFar = date.truncate(t: now(), unit: 1mo)
thisDaySoFar = date.truncate(t: now(), unit: 1d)
from(bucket: "consumption")
|> range(start: thisYearSoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
from(bucket: "consumption")
|> range(start: thisMonthSoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
from(bucket: "consumption")
|> range(start: thisDaySoFar, stop: now())
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter","tenant"])
|> sum(column: "_value")
|> map(fn: (r) => ({r with _value: r._value / 4.0}))
I've tried joining these tables in various ways, but nothing I'm doing is working properly to get me the one table with 4 columns that I'm looking for.
Anyone have ideas on how to achieve this? Thanks!
I worked with a Flux developer that helped me come up with the solution:
import "date"
sum_over_range = (unit) =>
from(bucket: "consumption")
|> range(start: date.truncate(t: now(), unit: unit))
|> filter(fn: (r) => r._measurement == "stuff" and r._field == "units" and r._value > 0)
|> group(columns: ["datacenter", "tenant"])
|> sum()
|> map(fn: (r) => ({r with _value: r._value / 4.0, _field: string(v: unit), _time: 0}))
union(tables: [sum_over_range(unit: 1y), sum_over_range(unit: 1mo), sum_over_range(unit: 1d)
])
|> group(columns: ["datacenter", "tenant"])
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> drop(columns: ["_time", "_start", "_stop", "result"])
|> group()
Then additionally in Grafana, I had to apply the 'Filter by name' transformation to hide the 'result' and 'table' columns that showed.

Resources